Modern Engineering mathematics-CRC Press (2018) PDF

Modern Engineering Mathematics
Abul Hasan Siddiqi

Mohamed Al-Lawati
Messaoud Boulbrachene
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks
does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion
of MATLAB® software or related products does not constitute endorsement or sponsorship by The
MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2018 by Taylor & Francis Group, LLC

CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper

Version Date: 20171116
International Standard Book Number-13: 978-1-4987-1205-7 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Prof. A. H. Siddiqi would like to dedicate the book to
his wife, Prof. Azra H.Siddiqi
Prof. M. Al-Lawati dedicates to his spouse.
Prof. M. Messaoud dedicates to his family.
Contents
Preface xvii
Acknowledgments xxiii
1 Matrices for Engineers 1
1.1 Vectors in R2 and R3 . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Dot and Cross Products . . . . . . . . . . . . . . . 5
1.1.2 Linear Dependence and Independence . . . . . . . . 9
1.1.3 Gram-Schmidt Orthogonalization Process . . . . . . 10
1.2 Basic Concepts of Matrices . . . . . . . . . . . . . . . . . . 11
1.2.1 Linear Equations and Matrices . . . . . . . . . . . . 20
1.2.2 Rank of Matrix . . . . . . . . . . . . . . . . . . . . 22
1.2.3 Special Classes of Matrices . . . . . . . . . . . . . . 22
1.2.4 Echelon Form of Matrix . . . . . . . . . . . . . . . . 25
1.3 Applications of Matrices to Real World Problems . . . . . . 26
1.3.1 Modeling of Temperature Distribution by Matrices 26
1.3.2 Modeling of Traffic Flow by Matrices . . . . . . . . 28
1.3.3 Matrices for Chemical Balance Equations . . . . . . 30
1.3.4 Modeling by Matrix Equation in Business . . . . . . 31
1.3.5 Role of Matrices in Electrical Networks . . . . . . . 33
1.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 35
1.4.2 Determinants of Matrices . . . . . . . . . . . . . . . 36
1.4.3 Properties of Determinants . . . . . . . . . . . . . . 37
1.4.4 Cramer’s Rule: Application of Determinants to Solve
Matrix Equations . . . . . . . . . . . . . . . . . . . 40
1.5 Inverse of Matrix and Its Computation . . . . . . . . . . . . 44
1.6 Eigenvalue Problems for Square Matrices . . . . . . . . . . 47
1.6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . 47
1.6.2 Diagonalization and Similar Matrices . . . . . . . . 53
1.6.3 Orthogonalization and Diagonalization of 2 × 2
Matrices . . . . . . . . . . . . . . . . . . . . . . . . 55
1.7 Miscellaneous Applications . . . . . . . . . . . . . . . . . . 62
1.7.1 Digital Image Processing . . . . . . . . . . . . . . . 62
1.7.2 Matrices in Digital Image Compression . . . . . . . 63
vii
viii Contents
1.7.3 Cryptography with Matrices . . . . . . . . . . . . . 65

1.7.4 Transform Coding . . . . . . . . . . . . . . . . . . . 68
1.7.5 Markov Matrix and Markov Process . . . . . . . . . 70
1.8 Introduction to MATLAB® and MATHEMATICA . . . . . 75
1.8.1 MATLAB for Matrices . . . . . . . . . . . . . . . . 75
1.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
1.10 Suggestion for Further Reading . . . . . . . . . . . . . . . . 111
Bibliography 113
2 Differential Equations 115
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

2.1.1 Definitions and Terminology . . . . . . . . . . . . . 117
2.2 Introduction to Mathematical Modelling . . . . . . . . . . . 126
2.2.1 Population Dynamics (Exponential and Logistic
Model) . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.2.2 Radioactive Decay . . . . . . . . . . . . . . . . . . . 128
2.2.3 Carbon Dating . . . . . . . . . . . . . . . . . . . . . 129
2.2.4 Newton’s Law of Cooling . . . . . . . . . . . . . . . 129
2.2.5 Spread of Diseases and Rumors . . . . . . . . . . . 129
2.2.6 Series Circuit . . . . . . . . . . . . . . . . . . . . . 130
2.2.7 Draining Tank . . . . . . . . . . . . . . . . . . . . . 131
2.2.8 Spring and Mass System . . . . . . . . . . . . . . . 132
2.2.9 Mixture of Salt and Payment of Loan . . . . . . . 133
2.2.10 Predator-Prey Model . . . . . . . . . . . . . . . . . 134
2.2.11 Model of Groundwater Contaminant Source . . . . 135
2.2.12 Heart Pacemaker . . . . . . . . . . . . . . . . . . . 136
2.2.13 X-Ray and Beer’s Law . . . . . . . . . . . . . . . . 136
2.2.14 Model of Spreading Information . . . . . . . . . . . 137
2.2.15 Model for Circulation of Money . . . . . . . . . . . 138
2.3 MATLAB and MATHEMATICA for Differential Equations 138
2.3.1 Solving Ordinary Differential Equations (ODEs)
by MATLAB . . . . . . . . . . . . . . . . . . . . . . 138
2.3.2 Solving Differential Equation by MATHEMATICA 141
2.4 Methods for Solving First Order Linear Differential Equations 143
2.4.1 Method of Separation of Variables . . . . . . . . . . 143
2.4.2 Linear Equations . . . . . . . . . . . . . . . . . . . 145
2.4.3 Exact Equations . . . . . . . . . . . . . . . . . . . . 148
2.5 Methods for Solving Higher Order Differential Equations . . 151
2.5.1 Initial Value and Boundary Value Problems . . . . 151
2.5.2 Homogeneous Equations . . . . . . . . . . . . . . . 153
2.5.3 Non-homogeneous Equations . . . . . . . . . . . . . 156
2.5.4 Reduction of Order . . . . . . . . . . . . . . . . . . 158
Contents ix
2.5.5 Homogeneous Linear Equations with Constant

Coefficients . . . . . . . . . . . . . . . . . . . . . . . 159
2.5.6 Method of Undetermined Coefficients . . . . . . . . 162
2.5.7 Method of Variation of Parameters . . . . . . . . . 170
2.5.8 Cauchy-Euler Equation . . . . . . . . . . . . . . . . 172
2.6 Solution of Engineering Problems Modeled by Differential
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
2.6.1 Population Dynamics . . . . . . . . . . . . . . . . . 174
2.6.2 Radioactive Decay . . . . . . . . . . . . . . . . . . . 179
2.6.3 Carbon Dating . . . . . . . . . . . . . . . . . . . . . 179
2.6.4 Newton’s Law of Cooling . . . . . . . . . . . . . . . 181
2.6.5 Spread of Diseases, Technologies and Rumor . . . . 183
2.6.6 Series Circuit . . . . . . . . . . . . . . . . . . . . . 185
2.6.7 Draining Tank . . . . . . . . . . . . . . . . . . . . . 186
2.6.8 Spring and Mass . . . . . . . . . . . . . . . . . . . . 186
2.6.9 Mixture of Salt and Payment of Loan . . . . . . . . 187
2.7 Laplace Transform for Linear Differential Equations . . . . 190
2.7.1 Introduction to Laplace Transform . . . . . . . . . . 190
2.7.2 Translation Theorems . . . . . . . . . . . . . . . . . 199
2.7.3 Inverse Laplace Transform . . . . . . . . . . . . . . 201
2.7.4 Step and Impulse Functions . . . . . . . . . . . . . 204
2.7.5 Some Additional Properties . . . . . . . . . . . . . 207
2.7.6 Application to Differential and Integral Equations . 213
2.8 Series Solution of Differential Equations . . . . . . . . . . . 223
2.8.1 Review of Properties of Power Series . . . . . . . . 223
2.8.2 Solution about Ordinary Point . . . . . . . . . . . . 226
2.8.3 Solution about Regular Singular Points: The Method
of Frobenius . . . . . . . . . . . . . . . . . . . . . . 229
2.8.4 Bessel’s Equation . . . . . . . . . . . . . . . . . . . 232
2.8.5 Legendre’s Equation . . . . . . . . . . . . . . . . . . 234
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Bibliography 249
3 Vector Calculus 251
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

3.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
3.2.1 Scalar Product . . . . . . . . . . . . . . . . . . . . . 256
3.2.2 Cross Product . . . . . . . . . . . . . . . . . . . . . 259
3.3 Differential Calculus of Vector Fields . . . . . . . . . . . . . 265
3.3.1 Curves . . . . . . . . . . . . . . . . . . . . . . . . . 265
3.3.2 Distances . . . . . . . . . . . . . . . . . . . . . . . . 266
3.4 Integration in Vector Fields . . . . . . . . . . . . . . . . . . 282
x Contents
3.4.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . 282

3.4.2 Surface Integrals . . . . . . . . . . . . . . . . . . . . 289
3.5 Fundamental Theorems of Vector Calculus . . . . . . . . . . 293
3.5.1 Theorem of Green and Ostrogradski . . . . . . . . . 294
3.5.2 Divergence Theorem of Gauss . . . . . . . . . . . . 297
3.5.3 Theorem of Stokes . . . . . . . . . . . . . . . . . . . 302
3.6 Applications of Vector Calculus to Engineering Problems . 305
3.6.1 Elements of Vector Calculus and Physical World . . 305
3.6.2 Applications of Line Integrals . . . . . . . . . . . . 314
3.6.3 Applications of Surface Integrals . . . . . . . . . . . 317
3.6.4 Applications of Gauss Divergence Theorem . . . . . 321
3.6.5 Application of Stokes Theorem . . . . . . . . . . . . 324
3.6.6 Example of Planar Fluid Flow . . . . . . . . . . . . 325
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Bibliography 335
4 Fourier Methods and Integral Transforms 337
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

4.2 Orthonormal Systems and Fourier Series . . . . . . . . . . . 338
4.2.1 Orthonormal Systems . . . . . . . . . . . . . . . . . 338
4.2.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . 345
4.2.3 Further Properties of Fourier Series . . . . . . . . . 357
4.3 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 366
4.3.1 Basic Properties of Fourier Transform . . . . . . . . 367
4.3.2 Convolution . . . . . . . . . . . . . . . . . . . . . . 374
4.3.3 Discrete Fourier Transform . . . . . . . . . . . . . . 376
4.4 Integral Transforms . . . . . . . . . . . . . . . . . . . . . . 377
4.5 Sturm-Liouville Problems . . . . . . . . . . . . . . . . . . . 378
4.5.1 Regular Sturm-Liouville Problems . . . . . . . . . . 378
4.6 Application of Fourier Methods to Signal Analysis . . . . . 381
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Bibliography 387
5 Applied Partial Differential Equations 389
5.1 Introduction to Partial Differential Equations . . . . . . . . 389

5.2 Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . 395
5.2.1 Solution Using Fourier Series . . . . . . . . . . . . . 396
5.2.2 Solution Using Fourier Transform . . . . . . . . . . 397
5.2.3 Solution Using Laplace Transform . . . . . . . . . . 398
5.3 Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . 400
Contents xi

5.3.3 Solution Using Laplace Transform . . . . . . . . . . 403
5.4 Laplace Equation . . . . . . . . . . . . . . . . . . . . . . . . 404
5.4.3 Solution Using Laplace transform . . . . . . . . . . 412
5.5 Poisson Equation . . . . . . . . . . . . . . . . . . . . . . . . 413
5.6 Simulation of Heat Equation, Wave Equation and Laplace
Equation by MATLAB . . . . . . . . . . . . . . . . . . . . . 414
5.7 Solving Partial Differential Equation by MATHEMATICA . 430
5.8 Practical Applications in Physics and Mechanics . . . . . . 433
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
Bibliography 443
6 Algorithmic Optimization 445
6.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 446

6.1.1 Gradient Method . . . . . . . . . . . . . . . . . . . 446
6.2 Analysis of Quadratic Functions . . . . . . . . . . . . . . . 447
6.2.1 Newton’s Method . . . . . . . . . . . . . . . . . . . 450
6.2.2 Line Search Descent Algorithm . . . . . . . . . . . . 451
6.2.3 The Method of Steepest Descent . . . . . . . . . . . 451
6.2.4 Conjugate Gradient Method . . . . . . . . . . . . . 452
6.3 Linear Programming . . . . . . . . . . . . . . . . . . . . . . 454
6.4 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . 456
6.4.1 Tableau Rules . . . . . . . . . . . . . . . . . . . . . 458
6.5 Complementarity Problems . . . . . . . . . . . . . . . . . . 460
6.5.1 Problem Statement . . . . . . . . . . . . . . . . . . 460
6.6 Variational Inequalities . . . . . . . . . . . . . . . . . . . . 462
6.6.1 Variational Inequality Problem . . . . . . . . . . . . 463
6.6.2 Systems of Equations . . . . . . . . . . . . . . . . . 463
6.6.3 Optimization and Variational Inequalities . . . . . . 463
6.7 Queuing Theory . . . . . . . . . . . . . . . . . . . . . . . . 464
6.7.1 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . 466
6.7.2 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . 469
6.8 Iterative Methods and Pre-conditioning . . . . . . . . . . . 471
6.8.1 Norms of Vectors and Matrices . . . . . . . . . . . . 471
6.8.2 General Iterative Method . . . . . . . . . . . . . . . 473
6.8.3 Jacobi Iterative Method . . . . . . . . . . . . . . . . 474
6.8.4 The Gauss-Seidel Method . . . . . . . . . . . . . . . 476
6.9 Krylov Methods . . . . . . . . . . . . . . . . . . . . . . . . 477
6.9.1 Arnoldi’s Orthogonalization Method . . . . . . . . . 478
xii Contents
6.9.2 Arnoldi’s Method for Linear Systems . . . . . . . . 478

6.9.3 Conjugate Gradient Method . . . . . . . . . . . . . 479
6.9.4 Preconditioned Conjugate Gradient Method . . . . 484
6.9.5 Generalized Minimum Residual Method (GMRES) 486
6.10 Multi-grid Methods . . . . . . . . . . . . . . . . . . . . . . 488
6.10.1 Multi-grid Cycles . . . . . . . . . . . . . . . . . . . 488
6.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Bibliography 493
7 Computational Numerical Methods in Engineering 495
7.1 Introduction to Numerical Differentiation . . . . . . . . . . 495

7.1.1 Introduction to Finite Difference Methods . . . . . 495
7.1.2 Taylor’s Formula . . . . . . . . . . . . . . . . . . . . 496
7.1.3 Finite Differences for Function f of Two Variables . 498
7.1.4 Application to Partial Differential Equations . . . . 504
7.1.4.1 Poisson’s Problem with Dirichlet Boundary
Conditions . . . . . . . . . . . . . . . . . . 504
7.1.4.2 Finite Difference Methods for Parabolic
Problems . . . . . . . . . . . . . . . . . . . 506
7.1.4.3 Finite Difference Methods for Hyperbolic
Problems . . . . . . . . . . . . . . . . . . . 510
7.2 Finite Element in One Dimension . . . . . . . . . . . . . . . 512
7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
Bibliography 521
8 Complex Analysis 523
8.1 Motivation and Historical Development for Complex Analysis 523

8.2 Functions of Complex Variables . . . . . . . . . . . . . . . . 525
8.2.1 Complex Numbers . . . . . . . . . . . . . . . . . . . 525
8.2.2 Geometrical Representation of Complex Numbers . 526
8.2.3 Sets in Complex Plane . . . . . . . . . . . . . . . . 533
8.2.3.1 Complex Sequences and Series . . . . . . . 536
8.2.3.2 Functions of Complex Variable . . . . . . 537
8.2.3.3 Cauchy-Riemann Equations . . . . . . . . 541
8.3 Complex Integration . . . . . . . . . . . . . . . . . . . . . . 546
8.4 Residues and Residue Theorem . . . . . . . . . . . . . . . . 558
8.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 558
8.4.2 Singularities and Residues . . . . . . . . . . . . . . 559
8.5 Application of Residue Theory for Evaluation of Real Integrals 569
Contents xiii
8.5.1 Evaluation of Integrals Involving Trigonometric

Functions . . . . . . . . . . . . . . . . . . . . . . . . 570
8.5.2 Evaluation of Several Integrals . . . . . . . . . . . . 573
8.6 Conformal Mappings . . . . . . . . . . . . . . . . . . . . . . 576
8.6.1 Complex Functions as Mappings . . . . . . . . . . . 576
8.6.2 Conformal Mappings . . . . . . . . . . . . . . . . . 579
8.6.3 Möbius Transforms . . . . . . . . . . . . . . . . . . 581
8.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 583
8.7.1 Electrostatics Potential . . . . . . . . . . . . . . . . 583
8.7.2 Heat Flow . . . . . . . . . . . . . . . . . . . . . . . 585
8.7.3 Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . 586
8.7.4 Tomography . . . . . . . . . . . . . . . . . . . . . . 589
8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
Bibliography 597
9 Inverse Problems 599
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

9.2 Inverse Problems in Pre-calculus . . . . . . . . . . . . . . . 601
9.2.1 Inverse Problem in Torricelli’s Law . . . . . . . . . 601
9.2.2 Inverse Problem in Projectile Motion . . . . . . . . 601
9.2.3 Inverse Problem in Scattering . . . . . . . . . . . . 602
9.2.4 Inverse Problem of Location and Identification . . . 602
9.2.5 Inverse Problem Related to eigenvalues . . . . . . . 603
9.3 Inverse Problems in Calculus . . . . . . . . . . . . . . . . . 603
9.3.1 Inverse Problem in Draining . . . . . . . . . . . . . 603
9.3.2 An Inverse Problem in Hanging Cable Models . . . 604
9.3.3 Inverse Problems in Study of Trajectories . . . . . . 604
9.4 Inverse Problems in Matrix Equations . . . . . . . . . . . . 605
9.4.1 Inverse Causation Problem . . . . . . . . . . . . . . 605
9.4.2 Identification Problems . . . . . . . . . . . . . . . . 606
9.4.3 Inverse Problem for Eigenvalues and Eigenvectors . 607
9.4.4 Least Squares Solutions to Inverse Problems . . . . 607
9.5 Inverse Problems in Differential equations . . . . . . . . . . 608
9.5.1 Inverse Problems for Mixing Problems . . . . . . . 608
9.5.2 Inverse Problem in Newton’s Law of Falling . . . . 609
9.5.3 Inverse Problem in Newton’s Cooling Law . . . . . 610
9.5.4 Inverse Problem in Finance . . . . . . . . . . . . . . 611
9.5.5 Inverse Problem in Carbon dating . . . . . . . . . . 611
9.5.6 Inverse Problem in Population Growth . . . . . . . 612
9.6 Inverse Problem in Partial Differential Equations . . . . . . 612
9.6.1 Inverse Problem for Heat Equation . . . . . . . . . 612
9.6.2 Inverse Scattering Problem . . . . . . . . . . . . . . 613
xiv Contents
9.6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . 614

9.6.4 Inverse Problem for Wave Equation . . . . . . . . . 614
9.6.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . 614
9.6.6 Financial Mathematics . . . . . . . . . . . . . . . . 616
9.7 Inverse Problems of Image Processing . . . . . . . . . . . . 617
9.7.1 Fundamental Steps in Digital Image Processing . . 617
9.7.2 Introduction to Medical Imaging . . . . . . . . . . . 618
9.7.3 Tomography . . . . . . . . . . . . . . . . . . . . . . 620
9.8 Seismic Tomography . . . . . . . . . . . . . . . . . . . . . . 620
9.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Bibliography 623
10 Wavelets 627
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 627

10.2 Overview of Wavelet Methods . . . . . . . . . . . . . . . . . 628
10.2.1 Definition and Example of Wavelets . . . . . . . . . 628
10.2.2 Multiresolution Analysis . . . . . . . . . . . . . . . 635
10.3 Applications of Wavelets . . . . . . . . . . . . . . . . . . . . 639
10.3.1 Applications of Wavelets to Biometrics . . . . . . . 639
10.3.2 CAT Scan . . . . . . . . . . . . . . . . . . . . . . . 640
10.3.3 Seismic Tomography . . . . . . . . . . . . . . . . . 643
10.3.4 Variants of Wavelets in Medical Imaging . . . . . . 644
10.3.5 Applications in Power Systems (Figure 10.17) . . . 647
10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
Bibliography 651
11 Miscellaneous Topics Used for Engineering Problems 655
11.1 Fractals in Engineering Science . . . . . . . . . . . . . . . . 656

11.1.1 Fractals and Interaction with Wavelets . . . . . . . 656
11.1.2 Fractal Image Processing . . . . . . . . . . . . . . . 665
11.1.3 Differential Equations on Fractals . . . . . . . . . . 670
11.1.4 Chaos and Fractals . . . . . . . . . . . . . . . . . . 671
11.2 Introduction to Time Series . . . . . . . . . . . . . . . . . . 671
11.2.1 Examples of Time Series . . . . . . . . . . . . . . . 672
11.2.2 Wavelets and Fractals in Time Series Analysis . . . 676
11.2.3 Prediction of Time Series Behavior Using Wavelets
and Fractals . . . . . . . . . . . . . . . . . . . . . . 680
11.2.4 Fractal Dimension and Predictability . . . . . . . . 682
11.3 Introduction to Neural Networks . . . . . . . . . . . . . . . 684
11.4 Introduction to Fuzzy and Neuro-fuzzy . . . . . . . . . . . . 692
Contents xv
11.5 Software for Time Series, Neural Network, Neuro-fuzzy

and Fuzzy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
11.6 Introduction to Graph Theory with Applications . . . . . . 700
11.7 Applications of Spline Polynomials . . . . . . . . . . . . . . 709
11.7.1 Polynomial Interpolation . . . . . . . . . . . . . . . 709
11.7.2 Spline Interpolation . . . . . . . . . . . . . . . . . . 712
11.8 Compression Sensing . . . . . . . . . . . . . . . . . . . . . . 715
11.9 Applications of Lozi Mappings . . . . . . . . . . . . . . . . 718
11.9.1 Lozi Mappings and Secure Communications . . . . 718
11.10 Introduction to Maxwell Equations with Applications . . . 719
11.11 Stochastic Calculus for Engineering Problems . . . . . . . 721
11.11.1 Stochastic Integration . . . . . . . . . . . . . . . . . 722
11.11.2 Stochastic Differential Equations . . . . . . . . . . . 723
11.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
11.13 Suggestion for Further Reading . . . . . . . . . . . . . . . 725
Bibliography 727
Appendix A Basic Concept of Calculus 733
A.1 Number System . . . . . . . . . . . . . . . . . . . . . . . . . 733

A.2 Intervals, Absolute Value and Inequalities . . . . . . . . . . 735
A.3 Binomial Formula and Quadratic Formula . . . . . . . . . . 735
A.4 Analytic Geometry and Trigonometry . . . . . . . . . . . . 736
A.5 Logarithmic and Exponential Functions . . . . . . . . . . . 743
A.6 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . 744
A.7 Integration Formulas . . . . . . . . . . . . . . . . . . . . . . 745
A.8 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . 746
A.9 Definition of Multiple Integrals . . . . . . . . . . . . . . . . 746
Appendix B Summary of Properties of Matrices 749
B.1 Properties of Matrix . . . . . . . . . . . . . . . . . . . . . . 749

B.2 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . 753
B.3 Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . 754
B.4 Sobolev Space . . . . . . . . . . . . . . . . . . . . . . . . . . 754
Appendix C Proof of Selected Theorems 755
C.1 Fundamental Theorem of Calculus . . . . . . . . . . . . . . 755

C.2 Green-Ostrogradski Theorem . . . . . . . . . . . . . . . . . 760
C.3 The Divergence Theorem of Gauss . . . . . . . . . . . . . . 761
C.4 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 764
C.5 Conservative Fields . . . . . . . . . . . . . . . . . . . . . . . 766
C.6 Proofs of Properties of Determinants . . . . . . . . . . . . . 768
xvi Contents
Appendix D Basic Concepts in Medical Imaging and Oil

Exploration 771
D.1 Fundamental Steps in Digital Image Processing . . . . . . . 771

D.2 Introduction to Medical Imaging . . . . . . . . . . . . . . . 772
D.3 Core Data and Well Loggings . . . . . . . . . . . . . . . . . 773
Appendix E Solution of Odd Number Exercises 775
E.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
Bibliography 801
Computer Programs Used 803
Index 823
Preface
This book entitled Modern Engineering Mathematics is a compendium of fun-

damental mathematical concepts, methods, models and their wide range of
applications in diverse fields of engineering. It comprises essentially a contem-
porary coverage of those areas of mathematics which provide foundation to
electronic, electrical, communication, petroleum, chemical, civil, mechanical,
biomedical, software and financial engineering. The rapid growth of technol-
ogy and engineering science based on mathematical concepts is driving force
for enlarging the domain of modern engineering mathematics.
The book gives a fairly extensive treatment to some of the recent devel-
opments in mathematics which have found very significant applications to
engineering problems. Rich illustrative engineering examples and the integra-
tion of MATLAB and MATHEMATICA further support readers of the book.
Essential topics of mathematics required by all types of engineers are covered
clearly and concisely and fundamental ideas are highlighted and developed
using illustrative examples. Extensive exercises and self-assessment questions
create confidence among students to apply mathematics to engineering prob-
lems.
Contemporary topics used by engineers to solve their problems are included
along with well thought applications appropriate for students of undergradu-
ate engineering. Unlike most engineering mathematics books, the presentation,
solved examples and unsolved examples are tailored (designed) particularly for
engineering students. A special feature of this book is the assumption of bare
minimum knowledge of pre-engineering mathematics along with evidence of
theoretic results not required by engineers. Theoretic results are discussed
in Appendices. The following special features are the hallmarks of the book
which make it distinct from the existing books on the theme.
• Some new areas relevant to exploration and production of hydrocar-
bons (petroleum engineering), biomedical engineering (biomedical im-
ages, particularly, CT scans, MRI, EEG), financial engineering (Black-
Sholes model), biometric for identification of persons, brain-machine in-
terface, software engineering and information technology.
• Foundations and updated special mathematical techniques most appro-
priate to students of electrical, electronic, communication, computer,
software, petroleum, chemical and financial engineering.
• Lucid presentation of topics like wavelet transforms, inverse problems,
xvii
xviii Preface
Radon transform, fractals, compression sensing and besides well non-

classical topic and their applications is an attractive feature. Another
interesting feature is a wealth of examples to provide clues for tackling
engineering problems.
• Visualization of ODEs (ordinary differential equations) and PDEs (par-
tial differential equations) by MATLAB and MATHEMATICA are pre-
sented. We also cover topics like matrices, ODEs, PDEs, and complex
analyses along with their applications. Suggestions for further reading
are given at the ends of each chapter. Such lists are useful for those who
intend to pursue specific topics at more advanced levels.
The book comprises eleven chapters and five appendices. Each chapter is di-
vided into sections and subsections. Chapter 1 is devoted to matrices and
their applications to engineering problems. Applications of matrix analysis
to traffic flow, electrical networks, chemical equations, cryptography, trans-
form coding, web searches, and ranking are notable themes of this chapter.
MATHEMATICA and MATLAB are introduced. Chapter 2 begins with moti-
vation for studying differential equations by future engineers. Applications of
MATLAB and MATHEMATICA are also presented in this chapter along with
modeling of physical phenomena such as population dynamics, radioactive de-
cay, Newton’s law of cooling, spread of disease, series circuits, falling bodies,
draining a tank, carbon dating and spring systems by differential equations.
Classical methods of solutions of these models, namely underdetermined coef-
ficients, variation of parameters and Cauchy-Euler methods are presented. The
Laplace transform and its applications to solve linear differential equations as
well as series solutions of two well known differential equations: Legendre and
Bessel are discussed.
Chapter 3 deals with vector calculus providing basic properties including
three fundamental theorems: Green theorem, divergence theorem of Gauss,
Stokes theorem and application of these results to engineering problems.
Concepts of orthonormal systems along with several concrete examples,
Fourier expansion, Fourier transform, Fourier integral and fast Fourier trans-
form are explained in Chapter 4. Applications to signal analysis are intro-
duced, providing foundations of information technology. Shanon sampling the-
orem is explained. Heat, wave and Laplace equations are used in different
branches of engineering. Solutions of these equations with appropriate ini-
tial and boundary conditions applying Fourier series, Fourier transform, and
Laplace transform are discussed in Chapter 5. Visualization of solutions by
MATLAB is presented and for visualization by MATHEMATICA we refer to
reference [1] of Chapter 5.
Chapter 6 is devoted to algorithmic optimization. Besides well-known clas-
sical topics, new concepts like variational inequalities, Krylov method, and
multigrid methods are introduced. Numerical methods for solving engineering
problems such as finite difference method, finite element method are discussed
in Chapter 7. Several case studies from diverse fields are presented.
Preface xix
Basic results of complex analysis essential for understanding and solving

engineering problems are discussed in Chapter 8. Major challenging problems
of engineering are in the form of inverse problems, for example, location of a
tumor by CT scan or X-ray, determination of past events from observation,
finding the nature of an accessible region from measurements on the boundary
and the reconstruction of past events from observation of the present state.
This problem is explained in Chapter 9 with the help of examples of engi-
neering problems. This is an emerging area and gaining popularity in various
branches of engineering all over the world. Chapter 10 is devoted to wavelets
a topic introduced in the early 1980 s and quite popular among engineers. It
is a refinement of Fourier series. Introduction of this field is presented along
with various applications to engineering problems.
There are several topics introduced in the last few decades which have
been used by engineers and scientists to solve challenging problems of the
real world. Some of these topics such as fractals, interaction of fractals and
wavelets, fractal image compression, study of physical phenomena through
time series and wavelet fractal methods, graph theory, applications of Lozi
maps, neural networks, large networks, fuzzy logic, ANFIS, splines, stochastic
calculus, compression sensing are introduced in Chapter 11.
Appendix A is devoted to basic concepts of calculus. Appendix B deals
with pre-requisite topics. Proofs of selected theorems are given in Appendix C.
Appendix D presents basic concepts of medical imaging. Solutions of selected
exercises will be given as Appendix E. Selected references are given at the end
of each chapter.
MATLAB® is a registered trademark of The Mathworks, Inc. For product
information please contact:
The Mathworks, Inc.

3 Apple Hill Drive
Natick, MA, 01760-2098 USA
Tel: 508-647-7000
Fax: 508-647-7001
E-mail:[email protected]
Web:www.mathworks.com
A.H.Siddiqi
M.Al- Lawati
M. Boulbrachene
The Book has been
Completed
With
Active support
Of
Prof. P. Manchanda,
Pooja
Mamta Rani
Prof. R. Bhardwaj
Dr. Noor Zahra
Ashima Bangia
Sana Arif
Acknowledgments
The idea of writing this book was proposed during the 2014 Spring Semester
while A. H. Siddiqi served as a consultant in the Department of Mathematics
and Statistics at Sultan Qaboos University, Muscat, Oman. The proposal
was finalized in a series of meetings in the office of Prof. M. Al-Lawati who
headed the department. Taylor & Francis Group then accepted the proposal
and agreed to publish the book. We appreciate the efforts of Aastha Sharma,
the publisher’s acquisition editor who arranged the contract and provided
support throughout the writing process.
We take this opportunity to thank the Government of Oman’s Sultan Qa-
boos University and Sharda University, Greater Noida, India, specially its
Chancellor, Mr. P. K. Gupta for providing immense support for this book
writing project. We also want to thank Prof. P. Manchanda who heads the
Department of Mathematics at Guru Nanak Dev University, Amritsar, India
for her valuable help in enabling us to complete this book.
A.H.Siddiqi
xxiii
Chapter 1
Matrices for Engineers
1.1 Vectors in R2 and R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Dot and Cross Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Linear Dependence and Independence . . . . . . . . . . . . . . . . . . 9
1.1.3 Gram-Schmidt Orthogonalization Process . . . . . . . . . . . . . . 10
1.2 Basic Concepts of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Linear Equations and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.2 Rank of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.3 Special Classes of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.4 Echelon Form of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 Applications of Matrices to Real World Problems . . . . . . . . . . . . . . . 26
1.3.1 Modeling of Temperature Distribution by Matrices . . . . 26
1.3.2 Modeling of Traffic Flow by Matrices . . . . . . . . . . . . . . . . . . . 28
1.3.3 Matrices for Chemical Balance Equations . . . . . . . . . . . . . . 30
1.3.4 Modeling by Matrix Equation in Business . . . . . . . . . . . . . . 31
1.3.5 Role of Matrices in Electrical Networks . . . . . . . . . . . . . . . . . 33
1.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.2 Determinants of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.3 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5 Inverse of Matrix and Its Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.6 Eigenvalue Problems for Square Matrices . . . . . . . . . . . . . . . . . . . . . . . 47
1.6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.6.2 Diagonalization and Similar Matrices . . . . . . . . . . . . . . . . . . . 53
1.6.3 Orthogonalization and Diagonalization of 2 × 2 Matrices 55
1.7 Miscellaneous Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.7.1 Digital Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.7.2 Matrices in Digital Image Compression . . . . . . . . . . . . . . . . . 63
1.7.3 Cryptography with Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1.7.4 Transform Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
1.7.5 Markov Matrix and Markov Process . . . . . . . . . . . . . . . . . . . . 70
1.8 Introduction to MATLAB® and MATHEMATICA . . . . . . . . . . . . 75
1.8.1 MATLAB for Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
1.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
1.10 Suggestion for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
1
2 Modern Engineering Mathematics
1.1 Vectors in R2 and R3

In engineering, physics and other applied fields a quantity having magni-
tude and direction is called a vector or a vector quantity. A physical quantity
having only magnitude is called a scalar and is represented by a single num-
ber using appropriate units.
Force, weight, velocity and acceleration are examples of vector quantities
while mass, speed, and temperature are examples of scalar quantities.
Vectors in R2
A vector whose initial point or end is A and whose terminal point or tip is
−−→ −−→
B is AB. The magnitude of a vector is written as kABk. Two vectors that
have the same magnitude and the same direction are said to be equal. Thus
−−→ −−→
in Figure 1.1 we have AB = CD.
~D
c
FIGURE 1.1: Equal Vectors

−−→ −−→
The negative of a vector AB, written as −AB is a vector that has the same
−−→
magnitude but is opposite in direction. If α 6= 0 is a scalar, αAB is called the
−−→ −−→
scalar multiple of vector AB and it is a vector that is |α| times as long as AB.
−−→ −−→
If k > 0 then k AB has the same direction as AB. For k < 0, then direction
−−→ −−→
of k AB is opposite to AB, see Figure 1.2.
Addition and Subtraction
Two vectors are said to have common initial points such as in Figure 1.3.
−−→ −−→ −→ −−→
Vector AD is the sum of AB and AC. Thus diagonal AD of the parallelogram
−−→ −→ −−→ −→
having sides AB and AC represents the sum of vectors AB and AC. Difference
−−→ −→
of vectors AB and AC is defined as
−−→ −→ −−→ −→
AB − AC = AB + (−AC).
Matrices for Engineers 3
FIGURE 1.2: Parallel Vectors
............
AD =AB+AC
FIGURE 1.3: Addition of Vectors
−−→ → − →
−
If k = 0, 0AB = 0 , then 0 is called a zero vector that can be assigned any
−−→ −−→
direction. A vector AB in R2 is often written as AB = a = (a1 , a2 ), where
2
a1 and a2 are real numbers that is, a vector in R is represented by ordered
pairs of real numbers.
Definition 1. Let a = (a1 , a2 ) and b = (b1 , b2 ) be two vectors in R2 , then

(i) Addition: a + b = (a1 + b1 , a2 + b2 )
(ii) Scalar multiplication: αa = (αa1 , αa2 )
(iii) Equality: a = b if and only if a1 = b1 , a2 = b2 .
Remark 1. (i) −a = (−a1 , −a2 )
(ii) a − b = (a1 − b1 , a2 − b2 )
(iii) a + b = b + a for all vectors a and b
(iv) a + (b + c) =(a + b + c) for all vectors a, b and c
(v) a + 0 = a
(vi) a + (−a) = 0
(vii) k(a + b) = (ka + kb)
(viii) (k1 + k2 )a = k1 a + k2 a
(ix) k1 (k2 a) = (k1 k2 )a, k1 and k2 are scalars.

(x) 1a = a
(xi) 0a = 0.
Magnitude of a is denoted by kak and is defined as
1
kak2 = (a21 + a22 ) 2 .
A vector u = (u1 , u2 ) is called unit vector if kuk = 1.

a a
It may be observed that u = is a unit vector: kuk =k k=1.
kak kak
Let i = (1, 0), j = (0, 1), then a = (a1 , a2 ) can be written as
a = a1 i + a2 j
R.H.S = a1 (1, 0) + a2 (0, 1)
= (a1 , 0) + (0, a2 )
= (a1 , a2 )
= a
= L.H.S.
Example 1. (a) Find the magnitude of vectors 2i + 4j, and (2, 0).
(b) Given a = (4, −1), form a unit vector.

(c) Find 5a, a + b, a − b, ||a + b|| and ka − bk if a = 4i + 8j, b = −2 + 8j.
3
(d) Find a vector in the opposite direction of a = (4, 10) but as long.
1 1 √ 4
Solution: (a) k2i + 4jk = k(2, 4)k = (4 + 16) 2 = (20) 2 = 2 5
√
k(2, 0)k = 4 = 2.
Vectors in R3
A vector in R3 is represented by an ordered triple of numbers a1 , a2 , a3 . As in
1
R2 , the magnitude is defined as kak = (|a1 |2 + |a2 |2 + |a3 |2 ) 2 .
Sum of two vectors a and b in R3 is defined as
a + b = (a1 + b1 , a2 + b2 , a3 + b3 )
where a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ).

The distance between two vectors a and b in R3 is defined as
p
d(a, b) = ka − bk = (a1 − b1 )2 + (a2 − b2 )2 + (a3 − b3 )2 .
Remark 2. The concept of sum of vectors can be extended for vectors in

R3 , for example for a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ), a + b = b + a, and
1a = a, 0a = 0 = (0, 0, 0).
Let i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1)∈ R3 , then a = (a1 , a2 , a3 ) =
a1 i + a2 j + a3 k
L.H.S. = a1 (1, 0, 0) + a2 (0, 1, 0) + a3 (0, 0, 1)

= (a1 , 0, 0) + (0, a2 , 0) + (0, 0, a3 )
= (a1 , a2 , a3 )
= a
= R.H.S.

4 6 12
Example 2. (a) Find the magnitude of a = − , , .
7 7 7
(b) Express the vector a = (14, −15, 26) in terms of unit vectors i, j, k.
(c) Find a + (b + c), b + 2(a − 3c), ka + bk, ka − ck and kbkkck where
a = (1, −3, 2), b = (−1, 1, 1) and c = (2, 6, 9)
1.1.1 Dot and Cross Products

Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) be two vectors in R3 . Then the
dot product of a and b denoted by a.b is defined as
a.b = a1 b1 + a2 b2 + a3 b3 . (1.1)
The cross product of a and b denoted by a × b is defined as

i j k

a × b = a1 a2 a3 (1.2)
b1 b2 b3
= (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k.
It is clear that magnitude of a.b is denoted by

1
ka.bk = |a.b| = ((a1 b1 )2 + (a2 b2 )2 + (a3 b3 )2 ) 2
and
1
ka × bk = ((a2 b3 − a3 b2 )2 + (a1 b3 − a3 b1 )2 + (a1 b2 − a2 b1 )2 ) 2 .
The distance between a and b denoted by ka − bk is given by
ka − bk = k(a1 − b1 )i + (a2 − b2 )j + (a3 − b3 )kk

1
= ((a1 − b1 )2 + (a2 − b2 )2 + (a3 − b3 )2 ) 2
ka × bk2 = (a × b).(a × b)
= a.[b × (a × b)]
by the relation (a × b).c = a.(b × c) or
ka × bk2 = a.[(b.b)a − (b.a)b]

= (a.a)(b.b) − (b.a)(b.a)
= kak2 kbk2 − (a.b)2 .
We know that a.b = kakkbk cos θ where θ is the angle between vectors a and
b and so ka × bk2 = kak2 kbk2 − kak2 kbk2 cos2 θ or
ka × bk2 = kak2 kbk2 (1 − cos2 θ)

= kak2 kbk2 sin2 θ
or
ka × bk = kakkbk sin θ, 0 ≤ θ ≤ π. (1.3)
Let i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1), then a = a1 i + a2 j + a3 k and

b = b1 i + b2 j + b3 k.
It may be noticed that a × b is a vector while a.b is a scalar quantity.
Theorem 1. For vectors a, b, c and α a scalar the following properties hold
(a) a × b = b × a
(b) a × (b × c) = a × b + a × c
(c) (a + b) × c = a × c + b × c
(d) a × (αb) = (αa) × b = α(a × b), α a scalar
(e) a × b = 0 if either a = 0 or b = 0
(f ) a × a = 0
(g) a.(a × b) = 0
(h) b.(a × b) = 0.
The dot product is also known as inner product or scalar product. It can be
also defined as a.b = kak kbk cos θ where θ is the angle between the vectors.
Example 3. (a) Let i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) then i.i =
1, j.j = 1, k.k = 1, kik = kjk = kkk = 1 and in each case cos θ = 1.
√ 1
(b) kak = a.a = (a21 + a22 + a23 ) 2 .
(c) i.j = j.i, j.k = k.j = 0, k.i = i.k = 0.
Solution:
√ (a) i.i = (1, 0, 0).(1, 0, 0) = 1 + 0 + 0 = 1. Similarly j.j = 1, k.k = 1.
kik = 12 + 02 + 02 = 1.
1
(b) kak = (a21 + a22 + a23 ) 2 .
√ p p
a.a = (a1√ , a2 , a3 )(a1 , a2 , a3 ) = a21 + a22 + a23
Hence kak = a.a.
(c) i.j = (1, 0, 0).(0, 1, 0) = 0 + 0 + 0 = 0. j.i = (0, 1, 0).1, 0, 0) = 0.
Similarly j.k = k.j = 0, k.i = i.k = 0.
Definition 2. Vectors a and b are said to be orthogonal (perpendicular) if
a.b = 0 (dot product of a and b is zero).
Definition 3. (a) Angle between two vectors a = a1 i + a2 j + a3 k and b =
b1 i + b2 j + b3 k is defined as
a1 b1 + a2 b2 + a3 b3
θ = cos−1 .
kakkbk
(b) For non-zero vector a = a1 i + a2 j + a3 k in R3 , the angle α, β, γ between
a and the unit vectors i, j and k, respectively are called direction angles of a
and defined as
a.i a1 a2 a3
cos α = = , cos β = , cos γ =
kakkik ka|| ||a|| ||a||
and cos α, cos β and cos γ are called direction cosines of a. The direction
cosines
of a non-zero vector a are similarly the components of the unit vector
1
a
||a||
1 a1 a2 a3
a= i+ j+ k = cos αi + cos βj + cos γk.
kak kak kak kak
Definition 4. The component of a on an arbitrary vector b denoted by
compb a is defined as

1 a.b
compb a = a. b = .
kbk kbk
Example 4. (a) Find the angle between a = i + j + k and b = −i + j + k.

(b) Find the direction cosines of the vectors a = i + 2j + 3k.
(c) Find compb a, where a = 4i + 6j − 4k and b = i + j + 4k.
−1 + 1 + 1 1
Solution: (a) cos θ = √ √ =
3 1+1+1 3
1
θ = cos−1 .
3
a1 1 1
(b) cos α = =√ =√
||a|| 1+4+9 14
a2 2
cos β = =√
||a|| 14
a3 3
cos γ = =√
||a|| 14
a.b 4 + 6 − 16 −6
(c) compb a = =√ =√ .
kbk 1 + 1 + 16 18
It may be observed that the physical interpretation of the dot product is the
work done when a constant force of magnitude F moves an object a distance
d in the same direction of the force, that is, W = F.d .
Physical meaning of cross product

Let vector a denote side AC and b denote side AB respectively of a parallel-
A
b B
FIGURE 1.4: Geometrical Meaning of Cross Product
ogram as shown in Figure 1.4. Then the area of the parallelogram = ka × bk

= magnitude of the vector a cross b.
1
Area of the triangle ABC = ||a × b||.
2
Definition 5. (a) Two vectors a and b are parallel if a × b = 0.

(b) Three vectors a, b and c are coplanar (lie in the same plane) if
a.(b × c) = 0.
Example 5. (a) Show that i × j = k, j × k = i, k × i = j, j × i = −k, k × j =

−i, i × k = −j.
(b)

a1 a2 a3

a.(b × c) = b1 b2 b3
c1 c2 c3
where a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ), c = (c1 , c2 , c3 ).

Show also that a.(b × c) = (a × b).c.
Example 6. (i) Find the area of a triangle whose adjacent sides are given by
vectors a = (1, 2, 3), b = (1, −3, 5). (ii) For a = (5, −2, 1), b = (2, 0, −7) show
that a × b = 0, a.(a × b) = 0.
1.1.2 Linear Dependence and Independence

A subset S of R3 is called a subspace if x, y ∈ S implies that x + y ∈ S
and αx ∈ S for any real number α.
Definition 6. A set of vectors {a, b, c} is said to be linearly independent
or vectors a, b and c are called linearly independent if the only constants
satisfying the equation
αa + βb + γc = 0
are α = 0, β = 0, γ = 0. If a, b and c are not linearly independent then the
set of these vectors is linearly dependent and these vectors are also linearly
dependent.
Example 7. (a) Show that i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) are
linearly independent.
(b) Show that the vectors (1, 1) and (1, 2) are linearly independent.
(c) Examine whether vectors a = (1, 1, 1) and b = (2, −1, 4) and (5, 2, 7) are
linearly independent or linearly dependent.
Solution: (a) Let α(1, 0, 0)+β(0, 1, 0)+γ(0, 0, 1) = 0 then (α, β, γ) = 0 which
implies that α = β = γ = 0. Hence these vectors are linearly independent.
(b) Let α(1, 1) + β(1, 2) = 0 then (α + β, α + 2β) = 0 or
α + β = 0, α + 2β = 0
implying β = 0 and consequently α = 0. Thus these vectors are linearly

independent.
(c) Let αa + βb + γc = 0. Then
α(1, 1, 1) + β(2, −1, 4) + γ(5, 2, 7) = 0
or
(α + 2β + 5γ, α − β + 2γ, α + 4β + 7γ) = 0
α + 2β + 5γ = 0 ⇒ α = −2β − 5γ
α − β + 2γ = 0 ⇒ −2β − 5γ − β + 2γ = 0
or
−3β − 3γ = 0 or β = −γ
α + 4β + 7γ = 0
α = 3β.
Hence these three vectors are linearly dependent.
Definition 7. (Dimensions) A set of elements in R3 is called basis if these ele-
ments are linearly independent and every element of R3 is a linear combination
of basis elements. The number of elements in a basis is called the dimension.
Dimension of R3 is three as i, j, k are linearly independent and every element
of R3 is a linear combination of i, j, k (for each x ∈ R3 , x = αi + βj + γk).
Definition 8. (Orthogonal set of vectors) A set of vectors in R3 , {a, b, c} is
called orthogonal if a.b = 0, b.c = 0, a.c = 0.
1.1.3 Gram-Schmidt Orthogonalization Process

Notion of R3 can be extended for any natural number n. Jor̈gen Pedersen
Gram (1850-1916) of Denmark, an important functionary of an insurance com-
pany, and Erhard Schmidt (1876-1959) of Germany, a distinguished faculty
member of well known universities, developed an algorithm for constructing
orthonormal bases for Rn in particular and any vector space in general.
Let S = {u1 , u2 , . . . , un }, m ≤ n be a basis for Rn . Then {v1 , v2 , . . . vm }, where
v1 = u1

hu2 , v1 i
v2 = u2 − v1
hv1 , v1 i

hu3 , v2 i
v3 = u 3 − v2
hv2 , v2 i
.. ..
. = .
hum , v1 i hum , v2 i hum , vm−1 i
v m = um v1 − v2 − · · · − vm−1 ,
hv1 , v1 i hv2 , v2 i hvm−1 , vm−1 i
1 1 1
Bn = { v1 , v2 , . . . , vm }
kv1 k kv2 k kvm k
is an orthogonal basis in Rm .
Remark 3. Gram-Schmidt orthogonalization can be extended for any vector
space.
Example 8. Let S = {u1 , u2 }, where u1 = (3, 1), u2 = (1, 1) is a basis for
R2 . Obtain an orthonormal basis, say {w1 , w2 } of R2 .
Solution: Let v1 = u1 = (3, 1). Then

hu2 , v1 i
v2 = u2 − v1
hv1 , v1 i
4
= (1, 1) − (3, 1)
10
(as hu2 , v1 i = 4 and hu2 , v1 i = 10)

1 3
= − , .
5 5
{v1 , v2 } = {(3, 1), − 15 , 35 } is an orthogonal basis and {w1 , w2 }, where

1 3 1
w1 = v1 , w1 = √ , √
kv1 k 10 10

1 −1 3
w2 = v2 , w2 = √ , √
kv2 k 10 10
is an orthonormal basis.
1.2 Basic Concepts of Matrices

In this section we present an introduction of matrices and their relationship
with system of linear equations frequently occurring in diverse fields of study
like physics, biology, chemistry, social science, medical science and different
branches of engineering. Matrix theory or study of matrices provides compact
way for representing systems of linear equations and yields efficient procedure
for finding solutions. In Section 1.2.1 we describe properties of matrices which
we need in subsequent sections.
Definition 9. An m×n matrix A = (aij ), i = 1, 2, . . . , n and j = 1, 2, . . . , n is
a rectangular array of mn real or complex numbers arranged in m horizontal
rows and n vertical columns:

 
a11 a12 ··· a1n
 a21 a22 ... a2n 
A = (ai,j ) =  . (1.4)
 
..
 . ··· ··· aij 
am1 am2 ··· amn
The ith row of A is (ai1 ai2 . . . ain ), 1 ≤ i ≤ m while jth column of A is

 
a1j
 a2j 
 ..  .
 
 . 
amj
Note that A is m by n (written as m×n). If m = n, we say that A is a square

matrix of order n, and that the numbers a11 , a22 , a33 , . . . , ann form the main
diagonal of A. We call the number aij , which is in ith row and jth column of
A, as the i, jth element of A, or (i, j) entry of A, and we often write 1.4 as
A = (aij ).

1 2 i 4i + 1
Example 9. Let (i) A = (ii) B = (iii)
−1 0 2 + 3i −4
   
1 1 −1 0
C =  2 (iv)D =  2 0 1  (v) E = 50 (vi) A = −2 3 5 .
3 4 −1 2
(a) Write a21 and a22 of (i).
(b) Write b12 and b22 of (ii).
(c) Write c11 and c31 of (iii).
(d) Write d11 and d32 of (iv).
(e) Write e11 of (v).
(f) Write f12 and f13 of (vi).
(g) Write the main diagonal of D.
Solution: (a) a21 = −1, a22 = 0.
(b) b12 = 4i + 1, b22 = −4.
(c) c11 = 1, c31 = 1.
(d) d22 = 0, d32 = −1.
(e) e11 = 5.
(f) f12 = 3, f13 = 5.
(g) The main diagonal of D is 1,0,2.
It may be pointed out that in this chapter we concentrate on matrices with
real entries but matrices with complex numbers and functions as entries have
been studied.  
a11
 a21 
 
An n × 1 matrix is also called an n− vector, namely  a31  is called an n−
 
 .. 
 . 
an1
vector
 or column
 vector.
1  
 3  2
u=  −2  is a 3− vector. The n− vector
 −1  is a 4− vector and v =

3
0
all of whose entries are zero is denoted by 0.
If A is an n × n matrix, then rows of A are 1 × n matrices and the columns
of A are n × 1 matrices. The set of all n− vectors with real entries is denoted
R1n 1 × n matrix and also called a row vector.
Example 10. (Matrix of distances between cities given in kilometers)
London Madrid New York Tokyo Delhi
London 0 785 3469 5959 6704
 
Madrid  785 0 3593 6706 7255 
New York  3469 3593 0 6757 11766
 
Tokyo  5959 6706 6757 0 5840 
Delhi 6704 7255 11766 5840 0
Example 11. (Production matrix table)

Assume that a company has five plants, each of which produces four products.
Let us assume that aij denotes the number of units of product i produced by
plant j in one week, based on 4 × 5 matrix given below.
Plant 1 Plant2 Plant 3 Plant 4 Plant 5
Product 1  660 460 480 480 0 
Product 2  440 550 520 520 100 
Product 3  380 370 310 480 480 
Product 4 385 375 312 310 385
Example 12. (Windchill matrix table)

A combination of air temperature and wind speed makes a body feel colder
than the actual temperature. For example when temperature is 12o F and wind
is 18 km per hour, a heat loss equal to that when temperature is −20o F with
no wind occurs.
Definition 10. Two m × n matrices A = (aij ) and B = (bij ) are said to be
equal if they agree entry by entry, that is aij = bij for i = 1, 2, . . . , m and
j = 1, 2, . . . , n.
TABLE 1.1
kmph 17 12 7 2 −3 −8
5 12 7 0 −5 −10 −15
10 −3 −9 −15 −22 −27 −34
18 −11 −20 −25 −31 −38 −45
20 −17 −24 −31 −39 −46 −53
   
1 2 −1 1 2 −1
Example 13. A =  3 3 2 B =  3 3 2 .
0 −4 5 0 −4 5
A and B are equal.
Definition 11. Let A = (aij ) and B = (bij ) be two matrices, then the sum
A+B is an m×n matrix C = (cij ) denoted by cij = aij +bij , i = 1, 2, 3, . . . , m
and j = 1, 2, 3, . . . , n.
Note: We merely add corresponding entries.

4 2 1 2 1 −3
Example 14. (a) Let A = and B = then
−3 5 6 2 1 3

6 3 −2
A+B = .
−1 6 9
(b) Suppose a production company manufactures items A, B and C. Each

item is partially made in factory F1 in India and then finished in factory
F2 in Germany. The total cost of each item consists of the manufacturing
cost and the shipping cost. Then the costs at each factory (in dollars) can be
described by 3 × 2 matrices F1 and F2 :
Manufacturing cost Shipping cost

32 80 Item 1
!
F1 = 100 160 Item 2
140 40 Item 3
Manufacturing cost Shipping cost

80 120 ! Item 1
F2 = 100 100 Item 2
260 40 Item 3
 
112 200
The matrix F1 + F2 =  200 260  provides the total manufacturing
400 80
and shipping costs.
Remark 4. (a) Sum A and B, that is, A + B is defined if they have the same
number of rows and the same number of columns, in other words if they are
of the same
size.
2 3 −1 1 1
If A = and B = then A + B cannot be defined.
1 3 4 −1 2
(b) Let a = (a1 , a2 , a3 , . . . , an ) then a + 0 = a.
Definition 12. (Scalar multiplication) Let A = (aij ) be an m × n matrix
and α be a real number, then the scalar multiplication of A by α denoted by
αA is the m × n matrix C = (cij ), where cij = αaij , i = 1, 2, 3, . . . , m and
j = 1, 2, 3, . . . , n, that is C is obtained by multiplying each entry of A by α:
 
αa11 αa12 · · · αa1n
 αa21 αa22 · · · αa2n 
C = (αaij ) =  .
 
.. .. .. ..
 . . . . 
αam1 αam2 · · · αamn
   
2 3 −1 8 12 −4
Example 15. Let A =  1 1 1  and 4A =  4 4 4 .
−1 6 5 −4 24 20
   
2 0 3 4 −1 2 3 1
Example 16. Let A =  1 3 4 2  and B =  4 6 8 −1 .
2 −2 1 1 1 1 2 1
Find (i) A + αB (ii) A − αB (iii) A − B.
 
−α 2α 3α α
Solution: (i) αB =  4α 6α 8α −α .
α α 2α α
 
2−α 2α 3 + 3α 4+α
A + αB =  1 + 4α 4 + 6α 4 + 8α 2−α 
2+α −2 + α 1 + 2α 1+α
for α = 3,
 
−1 6 12 7
=  13 22 28 −1  .
5 −1 7 4
 
2+α −2α 3 − 3α 4 − α
A − αB =  1 − 4α 4 − 6α 4 − 8α 2 + α 
2 − α −2 − α 1 − 2α 1 − α
 
5 −6 −6 1
=  −10 −14 −20 5  .
−1 −5 −5 −2
 
3 −2 0 3
A − B =  −3 −2 −4 1  .
1 −3 −1 0
A − B is called the difference between A and B.

Example 17. (Inventory control) Suppose that a general merchant store is
dealing with 50 different items. The inventory at the beginning of the month
can be described by the inventory vector u = (a11 , a12 , a13 , . . . , a150 ) in R50 .
The number of items sold at the end of the month is given by the vector
v = (b11 , b12 , b13 , . . . , b150 ). The number of items unsold at the end of month is
represented by the vector u−v = (a11 −b11 , a12 −b12 , a13 −b13 , . . . , a150 −b150 ).
If the store receives a fresh shipment of goods, represented by the vector
w = (c11 , c12 , c13 , . . . , c150 ), then the new inventory of the store would be
u − v + w = (a11 − b11 + c11 , a12 − b12 + c12 , . . . , a150 − b1150 + c150 ).
Summation Notation: By Σni=1 ai we mean a1 +a2 +a3 +· · ·+an . The letter
i is the index of summation. It is a dummy variable that can be replaced by
another letter. Thus, we can write Σni=1 ai as Σnj=1 aj or Σnk=1 ak .
Σni=1 ai = a1 + a2 + a3 + · · · + an
Σnj=1 ai = a1 + a2 + a3 + · · · + an .
This summation notation satisfies the following properties:
(i) Σni=1 (αi + βi )ai =Σni=1 αi ai +Σni=1 βi bi

(ii) Σni=1 γ(αi ai ) = γ Σni=1 (αi ai )
(iii) Σni=1 Σm m n
i=1 aij =Σi=1 Σj=1 aij .
Remark 5. (i) The right hand side matrix is obtained by adding all the
entries in each row then adding all the resulting numbers.
(ii) The left hand side is obtained by adding all the entries in each column
and then adding all the resulting numbers.
(iii) Let A1 , A2 , A3 , . . . , Ak be m×n matrices and c1 , c2 , . . . , ck are real num-
bers then Σki=1 ci Ai = c1 A1 + c2 A2 + c3 A3 + · · · + ck Ak is called a linear
combination of m × n matrices while c1 , c2 , c3 , . . . , ck are called coeffi-
cients.
 
20
Example 18. Let p =  100  be a 3− vector that represents the current
200
prices of three items at a general merchant store. Suppose that the store
announces a sale so that the price of each item is reduced by 20%.
(a) Determine a 3− vector that gives the price changes for three items.
(b) Determine a 3− vector that gives new price changes of the items.
Solution: (a) Since the prices of three item are reduced by 20%, the 3− vector
   
−54 4
−20%p =  −20  = −  20  .
−40 40
(b) The new prices of the items are given by the expression
     
20 20 4
p − 20% =  100  − 20%p =  100  −  20 
200 200 40
   
16 20
=  18  = 0.80  100  = −0.80 p. (1.5)
160 200
Definition 13. Let A = (aij ) be an m × n matrix; then the transpose A,
denoted by AT , is the n × m matrix defined by aTij = aji .
It is clear that transpose matrix AT is obtained from A by interchanging the
rows and columns of A.
   
4 3 2 4 −5 2
Example 19. (i) Let A =  −5 6 8 , then AT =  3 6 1 .
2 1 −1 2 8 −1
 
4 0
4 3 1
(ii) Let B = , then B T =  3 6  .
0 6 −3
1 −3
 
2
(iii) Let D =  −3 , then DT = 2 −3 −1 .
−1
 
2
(iv) Let E = 2 5 7 , then E T =  5  .
7
The concept of Rn can be extended for any finite number n. This means
the set of all set of n− vectors is denoted by Rn . It is also called the column
matrices.
   
a1 b1
 a2   b2 
If a =  .  and b =  . , then the dot product or inner product of a
   
 ..   .. 
an bn
and b denoted by a.b is defined as
a.b = a1 b1 + a2 b2 + · · · + an bn
Xn
= ai bi .
i=1
   
1 2
 3   −3 
Example 20. (i) Let a =  1  and b =  2  then a.b = (2 − 9 +
  
−1 −1
2 + 1) = −4.
   
x 8
(ii) Let a =  2  and b =  2 . If a.b = −8, find x.
3 4
Solution:
a.b = 8x + 4 + 12
8x + 16 = −8
8x = −24
x = −3.
Example 21. Suppose that an instructor uses four grades to determine a

student’s course average: quizzes, two hourly exams, and a final exam. These
are weighted as 10%, 20%, 20%, 50% respectively. If a student achieved grades
of 75, 85, 65 and 95 respectively, find the course average.
   
0.10 75
 0.20   85 
Solution: If u =  0.20  and v =  65  then the course average is given
  
0.50 85
by the dot product u.v, that is
u.v = (0.10)(75) + (0.20)(85) + (0.20)(65) + (0.50)(85)

= 0.75 + 17.0 + 13 + 42.50
= 73.25.
Multiplication of two matrices

Definition 14. If A = (aij ) is an m × p matrix and B = (bij ) is a p × n
matrix, then the product of A ad B, denoted by AB, is the m × n matrix
C = cij , defined by
cij = ai1 bij + ai2 b2j + ai3 b3j + · · · + aip bpj

Xp
= aik bkj , 1 ≤ i ≤ m, 1 ≤ j ≤ m. (1.6)
k=1
Remark 6. The product of two matrices A and B is defined only when the
number of rows of B is exactly the same as the number of columns of A.
 
−2 4
1 1 −1
Example 22. (i) Let A = and B =  4 −3 . Find
3 2 4
1 2
AB.  
4
1 x 3 24
(ii) Let A = and B =  8 . If AB = , find x and
2 −1 1 12
y
y.
1 2 2 1
(iii) Let A = and B = . Show that AB 6= BA.
−1 3 0 0
Properties of Matrix Addition, Multiplication and Transpose

Properties of Matrix Addition
Let A, B, C be m × n matrices, then
(a) A + B = B + A.
(b) A + (B + C) = (A + B) + C.
 
0 0 ··· 0
 0 0 ··· 0 
(c) There is a unique m × n matrix O =   such that
 
.. .. .. ..
 . . . . 
0 0 ··· 0
A + O = A. O is called the zero matrix.
(d) For each m × n matrix A, there is a unique m × n matrix D such that
A + D = O.
we can write D = −A so as
A + (−A) = O.
−A is called a negative of A, that is each entry of A is multiplied by −1.

Properties of Matrix Multiplication
If A, B, C are matrices of the appropriate sizes (say m × n), then
(a) A(BC) = (AB)C.
(b) (A + B)C = AC + BC.
(c) C(A + B) = CA + CB.
Properties of Scalar Multiplication
If α and β re real numbers and A and B are matrices of size m × n, then
(a) α(βA) = (αβ)A.
(b) (α + β)A = αA + βA.
(c) α(A + B) = αA + βA.
(d) A(αB) = α(AB).
Properties of Transpose
If A and B be m × n matrices and α be a scalar (real number), then
(a) (AT )T = A.
(b) (A + B)T = AT + B T .
(c) (AB)T = B T AT .
(d) (αA)T = αAT .

Remark 7. (i) As mentioned earlier AB need not be equal to BA.
(ii) AB may be the zero matrix with A 6= 0 and B 6= 0.
(iii) AB may be equal to AC with B 6= C.
1.2.1 Linear Equations and Matrices

In this section we show that solving the linear system of m equations in
n unknowns is equivalent to solving a matrix
 equation
 of the
 form Ax = b,
x1 b1
 x2   b2 
where A = (aij ) is an m × n matrix, x =  .  and b =  . .
   
 ..   .. 
xn bn
Let us consider the linear system of m equations in n unknowns
a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. ..
. . . . .
am1 x1 + am2 x2 + · · · + amn xn = bm .
Let
     
a11 a12 ··· a1n x1 b1
 a21 a22 ··· a2n   x2   b2 
A= , x =  , b =  .
     
.. .. .. .. ..
 . . ··· .   .   . 
am1 am2 ··· amn xn bm
Then
  
a11 a12 ··· a1n x1
 a21 a22 ··· a2n  x2 
Ax = 
  
.. .. ..  .. 
 . . ··· .  . 
am1 am2 ··· amn xn
 
a11 x1 + a12 x2 + · · · + a1n xn
 a21 x1 + a22 x2 + · · · + a2n xn 
=  (1.7)
 
.. .. .. .. 
 . . . . 
am1 x1 + am2 x2 + · · · + amn xn
 
b1
 b2 
= ..  .
 

 . 
bm
Thus, we have
Ax = b.
The matrix
 A is called the coefficient matrix
 of the linear system (1.6), and the
a11 a12 · · · a1n | b1
 a21 a22 · · · a2n | b2 
matrix  . ..  obtained by adjoining column b to
 
.. ..
 .. . . a2n | . 
am1 am2 · · · amn | bm
A, is called the augmented matrix of the linear system (1.6). The augmented
matrix of the linear system is denoted by (A|b). If b1 = b2 = · · ·m = 0 in (1.7)
then the linear system is called a homogeneous system. A homogeneous
system is written as Ax = 0, where A is a coefficient matrix.
Example 23. (a) Write the augmented matrix of the linear system:
−2x + z = 7
2x + 3y − 4z = 9
3x + 2y + 2z = 5.
(b) Ax = b is called consistent if b can be expressed as a linear combination

of columns of matrix A. Give an example.
     
−2 0 1 x 7
Solution: (a) A =  2 3 −4 , x =  y , b =  9  .
3 2 2 z 5
..
 
 −2 0 1 . 7 
.
3 −4 .. 9 
 
 2
Augmented matrix is  .
 · · · · · · · · · ... · · · 
 
 
..
3 2 2 . 5
 
3 1 2
 4 −5 6 
(b) Let Ax = b where A =  
 0 7 −3 
−1 2 0
 
  4
x1  1 
x =  x2  and b = 
 0 .

x3
2
       
3 1 2 4
 4   −5   6   1 
Then x1 
 0  + x2  7  + x3  −3  =  0 .
      
−1 2 0 2
Hence Ax = b is consistent.
1.2.2 Rank of Matrix

Definition 15. The rank of m × n matrix A denoted by rank A is the
maximum number of linearly independent row vectors in A.
 
1 1 −1 3
Example 24. Find the rank of A =  2 −2 6 8 .
3 5 −7 8
Solution: Let a = (1 1 − 1 3), b = (2 − 2 6 8) and c = (3 5 − 7 8).

1
It is clear that 4a− b+c = 0 which implies that a, b, c are linearly dependent; a
2
and b are nearly independent as they are not constant multiples of each other.
Hence rank A = 2.
1.2.3 Special Classes of Matrices

Here we present special forms of matrices known as diagonal matrix, zero
matrix, identity matrix, scalar matrix, self adjoint matrix,triangular matrix,
symmetric matrix, skew matrix, nonsingular (invertible) matrix, sub-matrix,
partitioned matrix, orthogonal and diagonalizable matrix.
 
0 0 0 ··· 0
 0 0 0 ··· 0 
A= . . . ..  having all 0 entries is called a zero matrix.
 
..
 .. .. .. . . 
0 0 0 ··· 0
A = (aij ) is called a diagonal matrix if aij = 0 for i 6= j.
 
3 0 0
For example A =  0 2 0  is a diagonal matrix.
0 0 4
A scalar matrix is a diagonal matrix whose diagonal elements are equal.
The scalar matrix In = (dij ), where dij = 1 for i = j and dij = 0 for i 6= j is
called the n × n identity matrix.
 
2 0 0
A= 0 3 0  is a diagonal matrix.
0 0 4
 
3 0 0
B= 0 3 0  is a scalar matrix.
0 0 3
 
1 0 0
I3 =  0 1 0  is the identity matrix.
0 0 1
Let A be an m × n matrix; then AIn = A and In A = A where In is an

n × n identity matrix.
An n × n matrix A = (aij ), i = 1, 2, 3, . . . , n, j = 1, 2, 3, . . . , n is called upper
triangular if aij = 0 for i > j. It is called lower triangular if aij = 0 for
i < j. It may noted that a diagonal matrix is both upper triangular and lower
triangular.
 
1 5 5
Example 25. The matrix A =  0 5 6  is upper triangular matrix and
0 0 2
 
5 0 0
B =  5 6 0  is lower triangular.
5 6 3
A matrix A = (aij ) with real entries is called symmetric if AT = A. It is

called skew symmetric if AT = −A.
   
1 1 2 1 1 2
For A =  1 −3 3 , AT =  1 −3 3  and A = AT , A is symmet-
2 3 1 2 3 1
ric.
0 1
The matrix B = is skew symmetric as
−1 0

0 −1 0 1
BT = =− = B.
1 0 −1 0
If we begin by an m × n matrix A = (aij ) and then cross out some, but not
all, of its rows or columns, we obtain a sub-matrix.
 
2 4 6 8
2 4 6 8
Let A =  −4 8 −6 10  .B= is a sub-matrix of
6 0 10 −6
6 0 10 −6

2 4 6
A as well as C = is a sub-matrix of A.
6 0 10
Non-singular Matrices
Definition 16. An n × n matrix A is called non-singular or invertible, if

there is an n × n matrix B such that AB = BA = In . B is called an inverse
of A if it has this property. If there is no such B then A is called singular or
non-invertible.
Remark 8. If A and B are n × n matrices such that AB = In , then BA = In

and hence B = A−1 . For verification, see Kolman and Hill [5].
3
!
2 3 −1
Example 26. Let A = and B = 2 . It can be checked
2 2 1 −1
that AB = BA = I2 which implies that B = A−1 .
Remark 9. The inverse of a matrix, if it exists, is unique.
Verification: Let B and C be inverse of A. Then
AB = BA = In , AC = CA = In .
A = BIn = B(AC) = (BA)C = In C = C.

This verifies that the inverse of a matrix is unique if it exists. Clearly AA−1 =
A−1 A = In .

a b
Example 27. Let A−1 = .
c d

−1 2 −1 a b
AA =
−1 1 c d

2a − c 2b + d
=
a + 2c b + 2d

1 0
=
0 1
2a − c = 1
a + 2c = 0
b + 2d = 0
b + 2d = 1.
From these four equations we get

2 −1 −1 2
a= , b= , c= , d= .
5 3 5 3
Orthogonal Matrix: A real matrix is called orthogonal if AT A = AAT = In

or AT = A−1 . Examples of orthogonal matrices follow.
Diagonal Matrix: An n × n matrix A is diagonalizable if there is an in-
vertible matrix P and n × n diagonal matrix D such that P −1 AP = D. We
say that P diagonalizes A and we say that P is a diagonalizing matrix for
0 √15 √2
     
0 1 0 0 0 1 5
A.  1 0 0 ,  − 12 13
5
13 0  and  1 0 0  are orthogonal
5 12 2 1
0 0 1 13 13 0 0 √5 − √5
matrix.
1.2.4 Echelon Form of Matrix

Definition 17. An m × n matrix A is said to be in reduced row echelon
form if it satisfies the following properties:
(a) All zero rows appear at the bottom of the matrix.
(b) The first non-zero entry from the left of a non-zero row is a 1. This entry
is leading one of its row.
(c) For each non-zero, the leading one appears to the right and below any
leading ones in preceding rows.
(d) If a column contains a leading one, then all other entries in the column
are zero.
Remark 10. A matrix in reduced row echelon from appears as a staircase
(echelon) pattern of leading once descending from the upper left corner of the
matrix. An m × n matrix satisfying properties (a), (b), and (c) is said to be
in row echelon form. There may be no zero rows.
A similar definition may be given for a reduced column echelon form and
column echelon form.
Examples of reduced row echelon form
 are: 
  1 0 0 0
1 2 0 0 1  0 1 0 0 
A= 0 0 1 2 3  and B =   0 0
.
1 0 
0 0 0 0 0
0 0 0 1
Definition 18. An elementary row (column) operation on matrix A is any
one of the following:
(a) Type I: Interchange any two rows (columns).
(b) Type II: Multiply a row (column) by a non-zero number.
(c) Type III: Add a multiple of one row (column) to another.
An m × n matrix B is said to be row (column) equivalent to an m × n
matrix A if B can be produced by applying a finite sequence of row operation.
Theorem 2. Every non-zero m × n matrix A = (aij ) is row (column) equiv-

alent to a matrix in row (column) echelon form.
First column having a non-zero entry is called the pivot column.

Theorem 3. Every non-zero m × n matrix A = (aij ) is row equivalent to a
unique matrix in reduced row (column) echelon form.
Theorem 4. Let Ax = b and Cx = d be two linear systems, each of m
equations in n unknowns. If augmented matrices (A|b) and (C|d) are row
equivalent, then the linear systems are equivalent, that is, they have exactly
the same solution.
For proof of these theorems we refer to [5].
1.3 Applications of Matrices to Real World Problems

1.3.1 Modeling of Temperature Distribution by Matrices
A simple model for estimating the temperature distribution on a square
plate can be written in the form of a linear system of equations which can
be expressed in the form of a matrix equation. Thus finding distribution of
temperature on a square plate is equivalent to solving a matrix equation.
Example 28. Determine the temperature at the interior points Ti , i =
1, 2, 3, 4 for the plate shown in Figure 1.5.
First we construct the linear system for estimating the temperatures. The
points at which we want to estimate temperature are indicated by dots in
Figure 1.5. Using the averaging rule we get
60 + 100 + T2 + T3
T1 = or 4T1 − T2 − T3 = 160.
4
T1 + 100 + 40 + T4
T2 = or − T1 + 4T2 − T4 = 140.
4
60 + T1 + T4 + 0
T3 = or − T1 + 4T3 − T4 = 60.
4
T3 + T2 + 40 + 0
T4 = or − T2 − T3 + 4T4 = 40.
4
T1 T2
T3 T4
FIGURE 1.5: Temperature Distribution
This leads to the augmented matrix for this linear system:

.
 
 4 −1 −1 .. 160 
.
−1 .. 160 
 
 −1 4
A=  ..
.

 −1 4 −1 . 60 
 
..
−1 −1 4 . 40
By solving the model Ax = y where

   
4 −1 −1 160
 −1 4 −1
 and y =  160  ,
  
A=  −1 4 −1   60 
−1 −1 4 40
we get x = (T1 , T2 , T3 , T4 ) where T1 = 65o , T2 = 60o , T3 = 40o , T4 = 35o .

1.3.2 Modeling of Traffic Flow by Matrices

At rush hours, traffic congestion is encountered at the street intersections
shown in Figure 1.6.
! t
---+ Nehru Street ---+
~ ~
E E
U5 A U5 8
;;,
~ ~
Co
Akbar Street
i
+-- +--
! D c
FIGURE 1.6: Traffic Flow
1. Corner A
700 cars an hour come down Nehru street to intersection A.
1300 cars an hour come down 8th to intersection A.
2. Corner B
200 cars an hour leave intersection B on Nehru Street.
900 cars an hour leave intersection B on 9th Street.
3. Corner C
400 cars an hour enter on Akbar street to intersection C.
400 cars an hour come down 10th street to intersection C.
4. Corner D
200 cars an hour leave intersection D on Akbar Street.
400 cars an hour leave intersection D on 8th Street to intersection A.
Let x1 denote the number of cars leaving corner A on Nehru Street towards
corner B.
Let x2 denote the number of cars arriving to corner B on 9th Street from
corner C.
Let x3 denote the number of cars leaving corner C on Akbar Street towards
corner D.
Let x4 denote the number of cars arriving to corner D on 8th Street from
corner A.
Assumptions. To solve this problem we assume the following:
1. To speed the traffic flow every car that arrives to a given corner must
also leave; hence at any corner, the number of cars arriving is equal to
number of cars leaving.
2. All streets are one-way.

3. All variables, x1 , x2 , x3 and x4 , are positive integers since they represent
numbers of cars.
Equation: Using assumption 1 for each corner, we obtain the following equa-
tions:
At corner A x1 + x4 = 700 + 300

At corner B x 1 + x2 = 900 + 200
At corner C x2 + x3 = 400 + 300
At corner D x3 + x4 = 400 + 200
These four equations form a system of linear equations that can be solved
using the Gauss-Jordan method (row reduction of the augmented matrix).
x1 + x4 = 1000
x1 + x2 = 1100
x2 + x3 = 700
x3 + x4 = 600
 
1 0 0 1
 1 1 0 0 
A=
 0

1 1 0 
0 0 1 1
 
1000
 1100 
y=
 700  .

600
By solving the matrix equation, we get x1 , x2 , x3 , x4 .

1.3.3 Matrices for Chemical Balance Equations

Mathematical equations may describe chemical reactions. In chemical re-
actions two sides are separated by arrow instead of equal to symbols. Let
sodium hydroxide (N aOH) react with sulfuric acid (H2 SO4 ) to form sodium
sulfate and water H2 O. The chemical reaction is denoted as
N aOH + H2 SO4 → N a2 SO4 + H2 O.
To balance this equation we insert unknowns x, y, z and w to get an equation
xN aOH + yH2 SO4 → zN a2 SO4 + wH2 O.
By comparing the number of sodium (N a), oxygen (O), hydrogen (H) and
sulfur (S) atoms on left and right sides, we get four linear equations:
N a : x = 2z
O : x + 4y = 4z + w
H : x + 2yz = 2w
S:y=z
It may be noted that we make use of the subscript in writing this system of
equations because they indicate the number of atoms of a particular element.
Thus we have a homogeneous linear system of four unknowns as
Ax = y
where
 
1 0 −2 0
 1 4 −4 −1 
A=
 
1 2 0 −2 
0 1 −1 0
or this matrix equation can
be written as augmented matrix
.
 
 1 0 −2 0 .. 0 
.
4 −4 −1 .. 0 
 
 1
 .
 .. 
 1
 2 0 −2 . 0  
.
0 1 −1 0 .. 0
The reduced row echelon form of this augmented matrix is
.
 
 1 0 0 −1 .. 0 

 0 1 −1 .. 
0 . 0 
 2 .

−1 ..

 0 0 1 . 0 
 
 2 
..
0 0 0 0 . 0
1 1
This gives us x = w, y = w, and z = w. We choose w = 2 so that all
2 2
unknowns x, y, z are positive (x = 2, y = 1 and z = 1). Thus the balanced
equation is
2N aOH + H2 SO4 → N a2 SO4 + 2H2 O.
1.3.4 Modeling by Matrix Equation in Business

Example 29. A construction company uses three types of steel A1 , A2 and
A3 for constructing three types of houses H1 , H2 and H3 . The steel required
for each type of house is given below:
TABLE 1.2
Steel H1 H2 H3
A1 2 3 4
A2 1 1 2
A3 3 2 1
Find the number of houses of each type of steel which can be produced using
29, 13 and 16 tons of steel.
Solution: Let x, y and z denote the number of houses that can be constructed
of each type. Then we have three equations:
2x + 3y + 4z = 29.
x + y + 2z = 13.
3x + 2y + z = 16.
This system of
 three equations
 can be 
written
 as a 
matrix equation
  Ax= y
2 3 4 29 2 3 4 x
where A =  1 1 2  and y =  13  or  1 1 2   y  =
  3 2 1 16 3 2 1 z
29
 13 . This equation can be solved by the Gauss Jordan elimination
16
method. By applying operation R1 ↔ R2 (Row R1 is interchanged by R2 )
the above system takes the form:
    
1 1 2 x 13
 2 3 4   y  =  29  .
3 2 1 z 16
Now applying R2 → R 3 − 2R1 and R3 − 3R1 , the above system is equivalent

2
to
    
1 1 2 x 13
 0 1 0  y  =  3 .
0 −1 −5 z −23
Applying R3 → R3 + R2 makes the above system equivalent to

    
1 1 2 x 13
 0 1 0  y  =  3 
0 −1 −5 z −20
or
x + y + 2z = 13
y = 3
−5z = −20.
Therefore, z = 4, y = 3 and putting these values in the first equation we get

x = 2.
Example 30. A manufacturing company produces 3 items every day. The
total production on a particular day is 45 tons. It is observed that the produc-
tion of the third product exceeds the production of the first by 8 tons while
the total production of the first and third products is thrice that of the second
product. Determine the level of products of each item.
Solution: Let the production levels of the three items be x, y and z respec-
tively. This situation is modeled by the equations
x + y + z = 45
z =x+8 or −x + 0y + z = 8
x + z = 2y or x − 2y + z = 0.
The matrix equation

Au = v
   
1 1 1 45
where A =  −1 0 1  , v =  8 .
1 −2 1 0
By solving this matrix equation we get
x = 11, y = 15, z = 19.
Production level of first item =11

Production level of second item =15
Production level of third item =19.
1.3.5 Role of Matrices in Electrical Networks

The electrical network shown in Figure 1.7 contains batteries (voltage
sources) and resistors (light bulbs or heaters) joined together by conductors
(wires).
A voltage source provides an electromotive force E measured in volts,
which moves electrons through the network. The rate at which electrons flow
along a conductor is called current I and is measured in amperes (amps). The
resistor acts to retard the flow of electrons, using up the current and lowering
the voltage. According to Ohm’s law, the potential difference E volts across a
resistor is given by E = IR, where I is the current measured in amperes and
R is the resistance measured in ohms (Ω).
30 B so
.---
12
A
_)
E=15V
20
c
I,
FIGURE 1.7: Electrical Network
Network analysis is based on Kirchhoff’s two laws that apply to the closed
voltage loops and nodes in the the network. A node is a junction of three or
more conductors. The network in Figure 1.7 has two nodes B and D and three
closed voltage loops defined as
A→D→B→A (Loop ADBA)
C→B→D→C (Loop CBDC)
A→D→C→B→A (Loop ADCBA).
If R11 , R12 , . . . , Rmm and V1 , V2 , . . . , Vm are constants the electric network can
be written as a matrix equation by Kirchhoff’s law for each loop.
Ax = y
where
 
R11 R12 ··· R1m
 R21 R22 ··· R2m 
A=
 ···

··· ··· ··· 
Rm1 Rm2 ··· Rmm
 
  I1
v1  I2 
y =  v2  , x T =  .  .
 
.. 
v3 
Im
Example 31. Set up and solve the system of equations (matrix equations)
of the networks given in Figure 1.8 and Figure 1.9.
,11
10v 27v
+ - + -
II
I, II
30
1 ~ so
5
1
60
b
I,
40
1 1
20
-'-
-~ 60
~ 30
I
The system of equations is
−i1 + i2 − i3 = 0
10 − 3i + 5i3 = 0
27 − 6i2 − 5i3 = 0
or
−i1 + i2 − i3 = 0
3i − 5i3 = 10
6i2 + 5i3 = 27.
Gaussian elimination gives
1 −1 1 | 0
 
 
−1 1 −1 | 0 −8 10 
−−−−−−−−−−→ 
0 1 |
 3 0 −5 | 10  row operations  3 3 .

1

0 6 5 | 27 0 0 51 |
3
35 38 1
The solution is i1 = , i2 = , i3 = .
9 9 3
The system of equations is
i1 − i2 − i3 = 0
5i2 − i1 − 5i2 = 0
−10i3 + 5i2 = 0
or
i1 − i2 − i3 = 0
5i2 i1 + 5i2 = 5i2
5i2 − 10i3 = 0.
Gauss elimination gives
−1 −1 |
 

1 −1 −1 | 0
 1 0
−−−−−−−−−− −→ 1 26 
 1 5 0 | 52  row operations  0 1 | .

0 5 −10 | 0 6 3
0 0 1 | 4
The solution is i1 = 12, i2 = 8, i3 = 4.
1.4 Determinants
1.4.1 Introduction
Let A be an n × n matrix, Associated with A is a number called the
determinant of A denoted by detA. Symbolically, we distinguish a matrix A
from its determinant by replacing the parentheses by vertical bars:

 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= .
 
. .. .. .. 
 . . . . 
an1 an2 · · · ann
and

a11
a12 ··· a1n

a21 a22 ··· a2n
detA = . .

.. .. ..
.. . . .

an1 an2 ··· ann
A determinant of an n × n matrix is said to be a determinant of order n.

We begin by defining the determinant of 1 × 1, 2 × 2 and 3 × 3 matrices.
1.4.2 Determinants of Matrices

Definition 19. Determinant of a 1 × 1 matrix A = (a) denoted by detA is
defined by
detA = |a|.
If A = (−4), then detA = | − 4| = 4.

a11 a12
Definition 20. Determinant of 2 × 2 matrix A = is defined by
a21 a22

a11 a12
detA = = a11 a22 − a12 a21 .
a21 a22

5 −1
Let A = , then
5 4

5 −1
detA = = 20 − (−5) = 25.
5 4
 
a11 a12 a13
Definition 21. Let A =  a21 a22 a23 . Then
a31 a32 a33

a11 a12 a13

detA = a21 a22 a23
a31 a32 a33
= a11 (a22 a33 − a23 a32 ) + a12 (−a21 a33 + a23 a31 ) + a13 (a21 a32 − a22 a31 )

a a23 a22 a23 a21 a22
= a11 22 + −a 12
+ a13
.
a32 a33 a31 a33 a31 a32
Thus detA = a11 c11 + a12 c12 + a13 c13 , where

a a23 a21 a23
c11 = 22 , c11 = a21 a22

, c 12 = − .
a32 a33 a31 a33 a31 a32
The cofactor of aij is generally the determinant
cij = (−1)i+j Mij (1.8)
where Mij is the determinant of the sub-matrix obtained by deleting the ith
row and jth column of A.
Example 32. Find detA and detB if
   
−5 0 0 0 7 0
A= 0 7 0  and B =  4 0 0 .
0 0 3 0 0 −2
Solution:

−5 0 0

detA = 0 7 0
0 0 3
= −5(21) − 0(−15 − 0)
= −105
detB = 0(0 − 0) + 7(−8)
= −56.
1.4.3 Properties of Determinants

We state properties of determinants in the forms of the following theorems
Theorem 5. detA = detAT , where AT denotes adjoint of n × n matrix A =
(an×n ).
Theorem 6. If two rows or columns of an n × n matrix A = (an×n ) are the
same, then detA = 0.
Theorem 7. detA = 0 if all entries of the matrix A = (an×n ) are zero.
Theorem 8. Let B be a matrix obtained by interchanging any two columns
(rows) of an n × n matrix A = (an×n ), then detB = −detA.
Theorem 9. detB = kdetA, where B is a matrix obtained by multiplying by
k (non-zero real number) all entries of a column or row.
Theorem 10. Let A and B be n × n matrices, then
det(AB) = det(A)det(B).
Theorem 11. Let A be an n × n triangular matrix (upper or lower). Then
detA = a11 a22 . . . ann
where a11 , a22 . . . and ann are the entries on the main diagonal of A.
Proofs of Theorems, 6, 7, 9 and 11 can be found in Appendix C.
Remark 11. Physical meaning
of determinant in R2 and R3
a a
In R2 determinant 1 2

is an indicator of whether the vectors (a1 , a2 )
b1 b2
and (b1 , b2 ) are co-linear. If the determinant is zero, then vectors (a1 , a2 ) and
(b1 , b2 ) are co-linear.
(a1 , a2 , a3 ), (b1 , b2 , b3 ), (c1 , c2 , c3 ) ∈ R3
and (a × b).c = Area of the parallelepiped in R3 .

The determinant of zero means the volume is zero and the three vectors are
coplanar.
 
1 1 1
Example 33. Let A =  2 −1 1 . Then detA = detAT .
3 1 −1
 
1 2 3
Solution: AT =  1 −1 1 
1 1 −1
 
1 1 1
detA =  2 −1 1 
3 1 −1
= 1(1 − 1) − 1(−2 − 3) + 1(2 + 3)
= 0+5+5
= 10
detAT = 0 − 2(−1 − 1) + 1(2 + 4)
= 4+6
= 10.
Hence detA = detAT .

   
2 −1 1 2 1 5
Example 34. Let A =  3 1 −1  and B =  4 3 8 .
0 2 2 0 −1 0
Show that detAB = detA detB.
 
0 −2 2
Solution: AB =  10 7 23 
8 4 16
 
0 −2 2
detAB =  10 7 23 
8 4 16
= 0 + 2(160 − 184) + 2(40 − 56)
= −80
detA detB = −80.
 
5 1 1
Example 35. Let A =  5 1 1 , then detA = 0.
3 2 7
Solution: detA = 5(7 − 2) − 1(35 − 3) + 1(10 − 3)

= 25 − 32 + 7
= 0.
 
1 2 5
Example 36. Let A =  0 0 0  then detA = 0.
3 4 6
Solution: detA = 1(0 − 0) − 2(0 − 0) + 5(0 − 0)
= 0.
   
2 3 1 0 1 0
Example 37. Let A =  0 1 0  and B =  2 3 1 
1 2 3 1 2 3
then −detA = detB.
Solution: detA = 2(3 − 0) − 3(0 − 0) + 1(0 − 1)

= 6−1
= 5
detB = 0(9 − 2) − 1(6 − 1) + 0(4 − 3)
= −5.
Thus −detA = detB.
 
4 0 0
Example 38. Let A =  3 2 0  then detA = −8.
5 1 −1
Solution: detA = 4(−2 − 0) − 0(−3 − 0) + 0(3 − 10)

= −8.
 
3 2 1
Example 39. Show that detA = 0, where A =  2 6 3 , using
5 −8 −4
properties of determinants.
Solution: detA = kdetB, where
 
3 1 1
B =  2 3 3 
5 −4 −4
 
3 2.1 1
A= 2 2.3 3 .
5 2.(−4) 4
This holds by Theorem 9 and detB = 0 by Theorem 6.

3 −4 7 4
Example 40. Let A = and B = . Show that
1 2 −1 −5
det(A + B) 6= detA + detB.

10 0
Solution: det(A + B) =
0 −3
= −30
detA = 6+4
= 10
detB = −35 + 4
= −31
detA + detB = −21.
Hence det(A + B) 6= detA + detB.
Example 41. Let A be an n × n matrix such that A2 = I, where A2 = A.A

and I is the identity matrix. Show that detA = ±1.
Solution: By Theorem 10:
det(A.A) = detA2 = detA.detA.
Since A2 = I, (detA)2 = detI = 1 or detA = ±1.

Matrix Equations
Swiss mathematician Gabriel Cramer (1704-1752) discovered a method to
solve a linear system of n equations in n unknowns using determinants. This
method is known as Cramer’s rule.

For a system of n linear equations in n unknowns,
Ax = y
where
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
.. .. ..
 
 
A = 
 . . ··· . 

 .. .. .. 
 . . ··· . 
an1 an2 ··· ann
xT = (x1 , x2 , . . . , xn ),
 
b1
 b2 
y =  . 
 
 .. 
bn
or
a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2
···························
an1 x1 + an2 x2 + · · · + ann xn = bn .
If detA 6= 0, then the solution of the matrix equation is given by

detA1 detA2 detAn
x1 = , x2 = , . . . , xn =
detA detA detA
where Ak , k = 1, 2, 3, . . . , n is defined as
 
a11 a12 ··· b1 a1(k+1) a1n
 a21 a22 ··· b2 a2(k+1) a2n 
Ak =   ··· ···
.
··· ··· ··· ··· 
an1 an2 ··· bn an(k+1) ann
In fact, Ak is A except where the kth column of A is replaced by the entries

of column matrix
 
b1
 b2 
y =  . .
 
 .. 
bn
Example 42. Solve the following system of equations applying Cramer’s rule
(i) a11 x1 + a12 x2 = b1

a21 x1 + a22 x2 = b2 .
(ii) x1 + x2 = 4
2x1 − x2 = 2.
(iii) x1 − x2 + 6x3 = −2
−x1 + 2x2 + 4x3 = 9
1
2x1 + 3x2 − x3 = .
2
Solution: (i)
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
⇔
Ax = y

a11 a12 x1 b1
where A = , x= , y=
a21 a22 x2 b2

a a12
detA = 11 = a11 a22 − a12 a21 .
a21 a22

b1 a12 b1 a12
= b1 a22 − a12 b2
A1 = , detA1 =
b2 a22 b2 a22

a11 b1 a11 b1
= a11 b2 − b1 a21
A2 = , detA2 =
a21 b2 a21 b2
detA1 b1 a22 − a12 b2

x1 = =
detA a11 a22 − a12 a21
a11 b2 − b1 a21
x2 = .
a11 a22 − a12 a21
(ii) a11 = 1, a12 = 1, b1 = 4, a21 = 2, a22 = −1, b2 = 2
4(−1) − 1 × 2 −6
x1 = = =2
−1 − 1 × 2 −3
and
−6
x2 = = 2.
−3
   
1 −1 6 −2
(iii) A =  −1 2 4  and b =  9 
3
2 3 −1 2
 
−2 −1 6
detA = −63, detA1 =  9 2 4  = 173, detA2 = −136,
3
2 3 −1
−61 detA1 −173 136 61
detA3 = , x1 = = , x2 = , x3 = .
2 detA 63 63 126
Example 43. The magnitudes T1 and T2 of the tensions are depicted by the
support wires shown in Figure 1.10, satisfying the equations:
cos(25o )T1 − cos(15o )T2 = 0

sin(25o )T1 + sin(15o )T2 = 300.
Use Cramer’s rule to find T1 and T2 .
300Kg
FIGURE 1.10: Tension Distribution
cos(25o ) − cos(15o )

0
Solution: A = , y = b = .
sin(25o ) cos(15o ) 300

cos(25o ) − cos(15o )
detA = ≈ 0.6428
sin(25o ) sin(15o )

0 − cos(15o )
detA1 = o ≈ 289.8
35 sin(15o )

cos(25o ) 0
detA2 = ≈ 271.9
sin(25o ) 300
detA1 289.8
T1 = ≈ ≈ 450.8
detA 0.6428
detA2 271.9
T2 = ≈ ≈ 423.
detA o.6428
1.5 Inverse of Matrix and Its Computation

In the real number system a real number b is called the inverse of a if
ab = 1. Similarly a matrix B is called the inverse of a matrix A and is denoted
by B = A−1 if AB = BA = I, where I denotes the identity matrix. A is called
a non-singular or invertible matrix if A−1 exists.
We know that every non-zero real number is invertible but in the case of
matrices it is untrue. There are matrices which are not invertible. For invertible
matrices the matrix equation Ax = y can be easily solved. The main goal of
this section is to give a characterization of an invertible matrix in term of
its determinant and formula for the inverse in terms of its determinant and
adjoint.
Properties of Inverse
Theorem 12. Let A and B be invertible (non-singular) matrices
(i) (A−1 )−1 = A.
(ii) (AB)−1 = B −1 A−1 .
(iii) (AT )−1 = (A−1 )T .
Characterization of Invertible Matrices:
Theorem 13. An n × n matrix is invertible if and only if detA 6= 0.
We require the following formula for computing inverses in the proof of
Theorem 13.
Lemma 1. Let A be an n × n matrix, adjoint of A, denoted by adjA and
defined as
 T  
c11 c12 ··· c1n c11 c21 ··· cn1
 c21 c22 ··· c2n   c12 c22 ··· cn2 
 =
   
 .. .. .. .. .. .. .. .. 
 . . . .   . . . . 
cn1 cn2 ··· cnn c1n c2n ··· cnn
and detA 6= 0. Then

1
A−1 = adjA.
detA
Remark 12. For a 3 × 3 invertible matrix

 
a11 a12 a13
A =  a21 a22 a23 
a31 a32 a33

a a23 a21 a23 a21 a22
c11 = 22 , c12 = − , c11 =
a32 a33 a31 a33 a31 a32
 
c c21 c31
1  11
A−1 = c12 c22 c32  .
detA
c13 c23 c33
Proof of theorem 13. Let detA 6= 0. Then A is invertible by Lemma 1 as

1
A−1 = AdjA can be computed. Let us assume that the converse holds,
detA
that is, A is invertible and A−1 exists. Then AA−1 = A−1 A = I. By Theorem
10:
det(I) = det(AA−1 = A−1 A = I) = detA.det(A−1 ).
or
detA.detA−1 = 1 6= 0.
This means that detA 6= 0.

 
2 −1 2
Example 44. Show that the matrix B =  1 −1 2  is the inverse
−3 2 −3
 
1 −1 0
of the matrix A =  3 0 2 .
1 1 1
Solution: We are required to show that AB = I. We have
  
1 −1 0 2 −1 2
AB =  3 0 2   1 −1 2 
1 1 1 −3 2 −3
 
2−1 −1 + 1 −2 + 2
=  6−6 −3 + 4 6−6 
2 + 1 − 3 −1 − 1 + 2 2 + 2 − 3
 
1 0 0
=  0 1 0 
0 0 1
AB = I.
So B is the inverse of A.
Example 45. Examine whether the following matrices are invertible (non-
singular) or non-invertible (singular). If invertible, compute the inverses.

6 0
(i) .
−3 2

−2π −π
(ii) .
−π π
 
0 2 0
(iii)  0 0 1 .
8 0 0

5 3
(iv) .
5 3
 
1 2 3
(v)  0 1 4  .
0 0 8

6 0
Solution: (i) Since det = 12 6= 0., this matrix is invertible by
−3 2
Theorem 13. By Lemma 1

1 6 −3
A−1 =
12 0 2
 1 −1 
=  2 4 .

1

0
6

−2π −π
(ii) = −3π 2 6= 0. By Lemma 1 it is invertible and its inverse
−π π
 −1 −1 
is  3π 3π .

−1 2

3π 3π

0 2 0

(iii) Let A = 0 0 1 .

8 0 0
a11 = 0, a12 = 2, a13 = 0, a21 = 0, a22 = 0, a23 = 1, a31 = 8, a32 = 0, a33 = 0.

0 1 0 1 0 0
c11 = = 0, c12 = = −8, c13 = .
0 0 8 0 8 0
detA = 0(0 − 0) − 2(−8) + 0 = 16.

 1 

0 0 2
 0 0
1  8 
A−1 1

= 8 0 0 = 0 0 .

16 
2
0 16 0
0 1 0

5 3
(iv) Since detA where A = , is zero, A is singular by Theorem 13.
5 3
5
 
 
 1 −2
1 2 3 8 
 1 
(v) It is invertible as det  0 1 4  6= 0 and A−1 =
 0 1 − .
0 0 8 2 
 1 
0 0
8
1.6 Eigenvalue Problems for Square Matrices

In this section we discuss eigenvalues and eigenvectors of square matrices.
In fact similar studies could be carried out for linear transformations in general
and matrix transformations in particular. Let A be an m×n matrix and u be a
non-zero n− vector; then the matrix product Au is an n− vector. A mapping
T on Rn into Rn is denoted by T : Rn → Rn . A matrix transformation is a
function T : Rn → Rn defined by T u = Au. The vector T u in Rn is called the
image of u and the set of all images of vectors in Rn is called the range of T .
The subset of Rn on which T is defined is called the domain of T . We limit
discussion to matrices and vectors with real entries but similar discussion is
possible for complex entries. The main goal is to find those vectors u for which
Au and u have the same direction, that is, those vectors for which Au = λu
where A is an n × n matrix, u is called an eigenvector and λ is called an
eigenvalue. Eigenvalues and eigenvectors are also known as characteristic
values or proper values and characteristic vectors or proper vectors.
The concepts of eigenvalues and eigenvectors have been utilized in different
branches of engineering such as control theory, vibration analysis (modeling a
mechanical system or the energy states of an atom), electric circuits, advanced
dynamics and quantum mechanics.
1.6.1 Eigenvalues and Eigenvectors

Definition 22. Let A be an n × n matrix. A number λ is called an eigenvalue
of A if there is a non-zero vector u such that
Au = λu. (1.9)
The solution vector u of the linear system (1.9) is an eigenvector correspond-

ing to the eigenvalue λ.
Remark 13. (i) Non-zero scalar multiples of eigenvectors are also eigen-
vectors.
(ii) The geometrical meaning of eigenvectors is that the image of eigenvector
u under A is parallel to u.
(iii) According to definition, a zero vector cannot be an eigenvector, but an

eigenvalue can be 0.
(iv) Let 0 be an eigenvalue of n × n matrix A. Then Ax = 0x = 0 which
implies that A is not invertible. A is invertible if and only if 0 is not an
eigenvalue of A.
Procedure for finding eigenvalues and eigenvector:
Au = λu or Au − λu = 0.
(Au − λIu) = 0.
(A − λI)u = 0.
λ is an eigenvalue of A if and only if
det(A − λI) = 0. (1.10)
The equation (1.10) is called the characteristic equation of A. The eigen-

values of A are roots of the characteristic equation. To find an eigenvector
corresponding to an eigenvalue λ solve
(A − λI) = 0. (1.11)
Example 46. Find eigenvalues and associated eigenvectors of the following

matrices:
1 2 3 4 1 −2
(i) A = (ii) B = (iii) C =
2 4 −1 7 2 0
   
−3 1 1 0 0 −1
(iv) D =  0 0 0  (v) E =  1 0 0  .
0 0 0 1 1 −1

1−λ 3
Solution: (i) det(A−λI) = = (1−λ)(4−λ)−4 = λ2 −5λ = 0.
2 4−λ
This yields eigenvalues λ1 = 0 and λ2 = 5. The eigenvector corresponding to
λ1 = 0 yields

1 2 u1 0
(A − 0I)u = = .
2 4 u2 0

u1 2
This implies that = is an eigenvector corresponding to λ1 =
u2 −1
0.
3 4
(ii) The characteristic equation of A = is
−1 7

3−λ 4
det(A − λI) =
−1 7−λ
= (3 − λ)(7 − λ) + 4
= λ2 − 10λ + 25
= (λ − 5)2
= 0.
We see that λ1 = 5 = λ2 is an eigenvalue of multiplicity 2.

3−5 4 u1
(A − 5I)u = =0
−1 7 − 5 u2
or − 2u1 + 4u2 = 0
−u1 + 2u2 = 0.
It is clear that u1 = 2u2 . Thus if we choose u2 = 1 we find the single eigen-

vector

2
u1 = .
1
(iii) The characteristic equation is

λ−1 2
det(A − λI) = = λ2 − λ + 4 = 0.
−2 λ
Eigenvalues are roots of the equation:
√ √
1 + 15i 1 − 15i
λ1 = and λ2 = .
2 2
Eigenvector corresponding to λ1 is obtained by solving (A − λI)U = 0:

1 −2 1 0
− λ1 u = 0
2 0 0 1
√
−1 + 15i
or u1 + 2u2 = 0
2 √
1 + 15i
−2u1 + u2 = 0.
2
 
1√
We obtain the general solution of this system to be α  1 − 15i , and this
4
√
1 + 15i
is an eigenvector associated with eigenvalue for any non-zero scalar
  2
1√
α. Similarly we find that β  1 + 15i  is an eigenvector corresponding to
√ 4
1 − 15i
eigenvalue for any β 6= 0.
2
 
−3 1 1
(iv) D =  0 0 0 
0 1 0
   
−3 1 1 1 0 0

det(D − λI) =  0 0 0  − λ  0 1 0 
0 1 0 0 0 1

−3 − λ 1 1

= 0 −λ 0
0 1 − λ −λ
= (−3 − λ)λ2 − 1(0) + 1(0)
= −(3 + λ)λ2
= 0.
 
1
u1 =  0  is only one eigenvector for eigenvalue λ2 = λ1 = 0 and u2 =
  3
1
 0  is only one eigenvector for eigenvalue λ3 = −3.
0
 
0 0 −1
(v) E =  1 0 0 
1 1 −1

−λ 0 −1

det(E − λI) = 0 −λ 0

1 1 −1 − λ
= −(λ + 1)(λ2 + 1)
= 0.
For λ1 = −1 we have
   
1 0 −1 | 0 1 0 −1 | 0
 1 1 0 | 0 ⇒ 0 1 1 | 0 
1 1 0 | 0 0 0 0 | 0
 
1
so that u1 = u2 and u2 = −u3 . If u3 = 1 then u =  −1 .
1
For λ2 = i we have
   
−1 0 −1 | 0 1 0 −i | 0
 1 −i 0 | 0 ⇒ 0 1 −1 | 0 
1 1 −1 − i | 0 0 0 0 | 0
so that u1 = iu3 and
 u2 =
 u3 .  
1 −1
If u3 = 1 then v =  1  and w = v =  1 .
1 1
Definition 23. Let A be n × n matrix and AT denote its adjoint matrix. A
is called symmetric if A = AT . A is called orthogonal matrix if AT A = I.
Remark 14. (i) It is clear that an n × n invertible matrix A is orthogonal
if A−1 = AT .
 
a11 a12 · · · a1n
 .. 
 a21 a22 . a21 
(ii) Let A =  .

.. .. ..  .

 . . . . . 
an1 an2 · · · ann
A is orthogonal matrix if and only if its columns
     
a11 a12 a1n
 a21   a22   a2n 
 ..  = X1 ,  ..  = X2 , . . . ,  ..  = Xn
     
 .   .   . 
an1 an2 ann
form an orthogonal set.
(iii) Let A be a symmetric matrix with real entries. Then eigenvalues of A
are real.
(iv) The eigenvectors corresponding to distinct eigenvalues of an n × n sym-
metric matrix are orthogonal.
Example 47. (i) Show that the identity matrix is orthogonal.
 
0 −1 0
(ii) Let A =  −1 −1 1 . Show that it is symmetric and its eigen-
0 1 0
vectors associated with three distinct eigenvalues are orthogonal and
construct an orthogonal matrix related to A.
 
1 0 0
Solution: (i) I =  0 1 0 
0 0 1
 
1 0 0
IT =  0 1 0 
0 0 1
and I T I = I. Hence I is orthogonal.

(ii) It can be checked that λ1 = 0, λ2 = 1, and λ3 = −2 and corresponding
eigenvectors are
     
1 −1 1
u1 =  0  , u2 =  1  , u3 =  2 .
1 1 −1
Since all the eigenvalues are distinct, we have
 
−1
uT1 u2 = 1 0 1  1  = 1.(−1) + 0.1 + 1.1 = 0

1
 
1
uT1 u3 = 1 0 1  2  = 1.1 + 0.2 + 1.(−1) = 0

−1
 
1
uT2 u3 = −1 1 1  2  = (−1).1 + 1.2 + 1.(−1) = 0

−1
     
1 −1 1
u1 =  0  , u2 =  1  , u3 =  2  .
1 1 −1
They are eigenvectors of the given matrix which are orthogonal. Now norms
of u1 , u2 and u3 are respectively
q √ q √ q √
||u1 || = uT1 u1 = 2, ||u2 || = uT2 u2 = 3, ||u3 || = uT3 u3 = 6.
Thus, an orthogonal set of vectors is
1 1
   

1
 −√ √
√  3   6 
 2   1   2 

 0  
, √ ,
  √ .

 3   6 
 1 1
1
  
√ −√
3 6
1 1 1
 
√ √ √
 2 − 3 6 
 
 1 2 
Hence 
 0 √ √  is an orthogonal matrix. We can check that
 3 6  
 1 1 
1 √ −√
3 6
T −1
P =P .
Example 48. Determine whether the given matrix is orthogonal:
   
0 1 0 0 0 0
(i)  1 0 0  (ii)  0 0 0 .
0 0 1 0 0 0
Example 49. Construct an orthogonal matrix

 from 
eigenvectors of the given
0 1 1
7 0
symmetric matrix: (i) (ii)  1 1 1 .
0 4
1 1 0
Properties of orthogonal and symmetric matrices:
(i) An n × n matrix A is orthogonal if and only if AT is orthogonal.

(ii) The determinant of an n × n orthogonal matrix A is +1 or −1, that is,
detA = ±1.
(iii) Let A be real symmetric matrix. Then eigenvectors associated with dis-
tinct eigenvalues are orthogonal.
(iv) Let A be real symmetric matrix. Then there is a real orthogonal matrix
that diagonalizes A.
(v) An n × n matrix A is orthogonal if and only if the row vectors and
column vectors form an orthonormal set of vectors in Rn .
1.6.2 Diagonalization and Similar Matrices

Definition 24. A square matrix have all off diagonal elements equal to zero
is called a diagonal matrix.
 
  1 0 0 0
1 0 0
1 0  0 1 0 0 
, 0 1 0 , 
0 1  0 0 1 0 
0 0 1
0 0 0 1
are examples of diagonal matrices.
Properties
 of diagonal matrix:   
α1 0 0 ··· 0 β1 0 0 ··· 0
 0 α 2 0 · · · 0   0 β2 0 ··· 0 
Let A = 
 · · · · · · · · · · · · · · ·  and B =  · · ·
  .
··· ··· ··· ··· 
0 0 0 0 αn 0 0 0 0 βn
Then
 
α1 β1 0 0 ··· 0
 0 α2 β2 0 · · · 0 
(i) AB = BA =   ···
.
··· ··· ··· ··· 
0 0 0 0 αn βn
(ii) |A| = α1 α2 . . . αn .
(iii) A is invertible if and only if each main diagonal element is non-zero.
(iv) If each αj 6= 0, then

 1 
0 0 ··· 0
 α1 
 1 
 0 0 ··· 0 
A−1 =
 α2 .

 ··· ··· ··· ··· ··· 
 1 
0 0 0 0
αn
(v) The eigenvalues of A are its main diagonal elements α1 , α2 , . . . , αn .

(vi) An eigenvector associated with αj is
 
0
 0 
 
 0 
 
 1 
 
 0 
 
 .. 
 . 
0
with 1 in row j and all other elements are zero.

Definition 25. The n × n matrices A and B are said to be similar if there
is an invertible n × n matrix P such that A = P BP −1 .
Properties:
(i) n × n similar matrices A and B have the same characteristics as poly-
nomials and hence the same eigenvalues (with the same multiplicities).
(ii) Two row equivalent matricesA and B are not necessarily
similar.
Note
2 0 1 0
for example matrices A = and B = (A and B
0 1 0 1
are row equivalent but they are not similar: A = P BP −1 leads to a
contradiction. Since B is the identity matrix, A = P IP −1 = P P −1 = I
is a contradiction as A is not an identity matrix).
Definition 26. A square matrix A is said to be diagonalizable if it is similar
to a diagonal matrix. In other words, a diagonal matrix A has the property
of an invertible matrix P and a diagonal matrix D such that A = P DP −1 .
Remark 15. (i) If A is diagoalizable, then
A3 = P D3 P −1
and in general
Ak = P Dk P −1 , k = 1, 2, 3, 4, · · · .
Verification:
A3 = (P DP −1 )3 = (P DP −1 )(P DP −1 )(P DP −1 )
= P DP −1 P DP −1 P DP −1 = P D(P P −1 )D(P D−1 )DP −1
= P DDDP −1 = P D3 P −1 .
   3 
7 0 0 7 0 0
(ii) If D =  0 −2 0  then D3 =  0 (−2)3 0 .
0 0 3 0 0 33
(iii) An n × n matrix is diagonalizable if and only if it has n linearly inde-

pendent eigenvectors.
Theorem 14. (diagonalization theorem)

(i) An n × n matrix A is diagonalizable if and only if A has n linearly
independent eigenvectors.
(ii) If v1 , v2 , . . . , vn are linearly independent eigenvectors of A and
λ1 , λ2 , . . . , λn are their corresponding eigenvalues, then A = P DP −1 ,
where
P = (v1 , v2 , . . . , vn )
and
 
λ1 0 ··· 0
 0 λ2 ··· 0 
D= .
 
.. .. .. ..
 . . . . 
0 0 ··· λn
(iii) If A = P DP −1 and D is a diagonal matrix, then the columns of P must

be linearly independent eigenvectors of A and the diagonal entries of D
must be their corresponding eigenvalues.
Remark 16. Theorem 14 tells us that if we can find n linearly independent

eigenvectors for n × n matrix A, then it is diagonalizable. Furthermore, we
can use those eigenvectors and their corresponding eigenvalues to find the
invertible matrix P and diagonal matrix D required to show that A is diago-
nalizable.
1.6.3 Orthogonalization and Diagonalization of 2×2 Matrices

a b
We discuss here the orthogonalization of 2 × 2 matrices A =
c d
and their diagonalization. In order that A is orthogonal, the set of rows (a, b)
and (c, d) must be orthogonal and must have length 1, that is

ac + bd = 0. (1.12)
2 2
a + b = 1. (1.13)
2 2
c + d = 1. (1.14)
Case 1: Multiply Equation (1.12) by d to get
acd + bd2 = 0.
Substitute ad = 1 + bc (|A| = ad − bc = ±1 or ad = 1 + bc) into this equation
to get
c(1 + bc) + bd2 = 0
or
c + b(c2 + d2 ) = 0.
But c2 + d2 = 1 so c + b = 0 or c = −b. Inserting this value in (1.12) yields
ab − bd = 0 or b(a − d) = 0. This means either b = 0 or a = d.
Case 1(a): b = 0. Then c = −b = 0 so

a 0
A= .
0 d
But each row vector has length 1, so a2 = 1 = d2 . Further, detA = ad = 1
in the present case, so a = d = 1 or a = d = −1. In these cases A = I2 , or
A = −I2 .
Case 1(b): b 6= 0. Then a = d, so

a b
A= .
−b a
Since a2 + b2 = 1, there is some θ in [0, 2π] such that a = cos θ, b = sin θ. Then

cos θ sin θ
A= .
sinθ cos θ
This includes the two results of case 1(a) by choosing θ = 0 or θ = π.
Case 2:
ad − bc = −1.
By an analysis similar to that above, we find that for some θ

cos θ sin θ
A= .
sin θ − cos θ
π
These two cases gives all the 2 × 2 orthogonal matrices. For θ = we get the
 4
1 1 1 1
  
 √2 √  √ √
orthogonal matrices  2  and  2 2  and for θ = π
 1 1   1 1  6
−√ √ √ −√
2 2 2 2
 √   √ 
3 1 3 1
√ √
 2
we get  2  and   2 2  orthogonal matrices.
√  √ 
1 3 1 3
   
− −
2 2 2 2
We can recognize the orthogonal matrices

cos θ sin θ
− sin θ cos θ
as rotations in the plane. If the positive x, y system is rotated counterclockwise
θ radians to find new x0 , y 0 system, the coordinates in the two systems are
rotated by
0
x cos θ sin θ x
= .
y0 − sin θ cos θ y
Diagonalization of a 2 × 2 real symmetric matrix
Consider the most general real symmetric 2 × 2 matrix

a c
A=
c b
where a, b and c are arbitrary real numbers. We compute the eigenvalues and
eigenvectors of A, and then find the real orthogonal matrix that diagonalizes
A.
The eigenvalues are the roots of the characteristic equation:

a−λ C
= (a − λ)(b − λ) − c2 = λ2 − λ(a + b) + (ab − c2 ) = 0.
c b−λ
The two roots, λ1 and λ2 , can be determined from the quadratic formula.
Noting that (a + b)2 − 4(ab − c2 ) = (a − b)2 + 4c2 , the two roots can be written
as:
1h p i
λ1 = a + b + (a − b)2 + 4c2
2
1h p i
and λ2 = a + b − (a − b)2 + 4c2 (1.15)
2
where by convention λ1 ≥ λ2 .
Since (a − b)2 + 4c2 ≥ 0 (as the sum of two squares must be non-negative),
Equation (1.15) implies that λ1 and λ2 are real. We next work out the two
eigenvectors and demonstarate that they are orthogonal. It is convenient to
define
p
D ≡ (a − b)2 + 4c2 . (1.16)
We first solve the eigenvalue equation,

a c x 1 x
= (a + b + D) .
c b y 2 y
This yields two equations:

1
ax + cy = (a + b + D)x,
2
1
cx + by = (a + b + D)y,
2
which can be rewritten as:
1
(a − b − D)x + cy = 0 (1.17)
2
1
cx + (b − a − D)y = 0. (1.18)
2
One can show that Equation (1.18) is a multiple of Equation (1.17) since
the rank of the matrix A − λ1 I is one]. Simply multiply Equation (1.18) by
(a − b − D)/(2c) to obtain
1 (a − b − D)(b − a − D)y
(a − b − D)x +
2 4c
1 [D2 − (a − b)2 ]y
= (a − b − D)x + = 0.
2 4c
Using Equation (1.16), D2 − (a − b)2 = 4c2 , and the above equation reduces
to
1
(a − b − D)x + cy = 0,
2
which is equivalent to Equation (1.17). Solving for y yields
(b − a + D)x
y= ,
2c
which means that the eigenvector corresponding to eigenvalue λ1 is given by

x x 2c
= .
y 1 2c b−a+D
Since λ2 differs from λ1 by changing the sign of D, it follows without further
computation that the eigenvector corresponding to eigenvalue λ2 is given by

x x 2c
= .
y 2 2c b−a−D
To show that the two eigenvectors are orthogonal, we evaluate the dot product
of (x y)1 and (x y)2 , which is equal to x1 x2 +y1 y2 . Inserting the corresponding
vector components, we end up with:
x2 2 x2 2
4c + (a − b)2 − D2 )

4c + (b − a + D)(b − a − D) =
4c2 4c2
x2 2
4c − 4c2

= 2
c
= 0,
after making use of D2 − (a − b)2 = 4c2 [cf. Equation (1.16)].

We now propose to find the real orthogonal matrix that diagonalizes A. The
most general 2 × 2 real orthogonal matrix S with determinant equal to 1 must
have the following form:

cos θ − sin θ
S= .
sin θ cos θ
Using this result, we determine θ in terms a, b and c such that

λ1 0
S −1 AS =
0 λ2
where λ1 and λ2 are the eigenvalues of A obtained in Equation (1.15). The

most straight-forward approach is to compute S −1 AS explicitly. Since the
off-diagonal terms must vanish, one obtains a constraint on the angle θ.

cos θ sin θ a c cos θ − sin θ
S −1 AS = (1.19)
− sin θ cos θ c b sin θ cos θ

cos θ sin θ a cos θ + c sin θ −a sin θ + c cos θ
=
− sin θ cos θ c cos θ + b sin θ −c sin θ + b cos θ
a cos2 θ + 2c cos θ sin θ + b sin2 θ (b − a) cos θ sin θ + c(cos2 θ − sin2 θ)

=
(b − a) cos θ sin θ + c(cos2 θ − sin2 θ) a sin2 θ − 2c cos θ + b cos2 θ

λ1 0
= . (1.20)
0 λ2
The vanishing of the off-diagonal elements of S −1 AS implies that:
(b − a) cos θ sin θ + c(cos2 θ − sin2 θ) = 0.
Using sin 2θ = 2 sin θ cos θ and cos 2θ = cos2 θ − sin2 θ, we can rewrite the
above equation as
1
(b − a) sin 2θ + c cos 2θ = 0.
2
It follows that:
2c
tan 2θ = (1.21)
a−b
after writing tan 2θ = sin 2θ/ cos 2θ.
Let us now consider the range of the angle θ. You might think that 0 ≤ θ < 2π.
However, since
cos(θ + π) = − cos θ, and sin(θ + π) = − sin θ,
it follows that shifting θ → θ + π simply multiplies S by an overall factor of -1.

Thus, S −1 AS is unchanged. Hence, without loss of generality, we may assume
that 0 ≤ θ < π. Unfortunately, Equation (1.21) does not distinguish between

the two intervals 0 ≤ θ < π/2 or π/2 ≤ θ < π, since tan 2θ = tan(2θ + π) is
unchanged if θ → θ + π/2.
However, we have not yet used all the available information. In particular, the
diagonal elements of Equation (1.20) also provide some information on the
possible values of θ. Summing the diagonal terms of the matrices in Equation
(1.20) yields:
λ1 + λ2 = (a cos2 θ + 2c cos θ sin θ + b sin2 θ)

+(a sin2 θ − 2c cos θ sin θ + b cos2 θ)
= (a + b)(cos2 θ + sin2 θ)
= a + b,
which is independent of θ. This is surprising since we know that
T rA = λ1 + λ2 = a + b.
However, λ1 − λ2 does not depend on θ:
λ1 − λ2 = (a cos2 θ + 2c cos θ sin θ + b sin2 θ)

−(a sin2 θ − 2c cos θ sin θ + b cos2 θ) (1.22)
2 2
= (a − b)(cos θ − sin θ) + 4c sin θ cos θ
= (a − b) cos 2θ + 2c sin 2θ
From Equations (1.15) and (1.21), we obtain

p
λ1 − λ2 = (a − b)2 + 4c2 = (a − b) cos 2θ + 2c sin 2θ (1.23)
Using Equation (1.21) to write:
2c 2c cos 2θ
a−b= = ,
tan 2θ sin 2θ
and inserting this on the left hand side of Equation (1.23), the latter reduces
to:
cos2 θ
(a − b) cos 2θ + 2c sin 2θ = 2c + 2c sin 2θ
sin 2θ
2c 2c
= (cos2 2θ + sin2 θ) = .
sin 2θ sin 2θ
Substituting this result back into Equation (1.23) and solving for sin 2θ, we
find:
2c
sin 2θ = p . (1.24)
(a − b)2 + 4c2
We can also obtain cos 2θ using eqs. (1.21) and( 1.24):

a−b
cos 2θ = p . (1.25)
(a − b)2 + 4c2
Equation (1.24) tells us in which quadrant θ lives. If 0 < θ < π/2, then
1
sin 2θ > 0, which implies that c > 0. If π < θ < π, then sin 2θ < 0, which
2
implies that c < 0. Thus , the sign of c determines the quadrant of θ. Equation
(1.25) provides additional information. For c > 0, the sign of a − b determines
1 1 1
whether 0 < θ < π or π < θ < π. The former corresponds to a − b > 0
4 4 2
while the latter corresponds to a − b < 0. Likewise, if c < 0, the sign of a − b
1 3 3
determines whether π < θ < π or π < θ < π. The former corresponds to
2 4 4
a − b < 0 while the latter corresponds to a − b > 0. The borderline cases are
likewise determined:
c=0 and a > b ⇒ θ = 0,
1
c=0 and a < b ⇒ θ = π,
2
1
a=b and c > 0 ⇒ θ = π,
4
3
a=b and c > 0 ⇒ θ = π.
4
If c = 0 and a = b, then A = I and it follows that S −1 AS = S −1 S = I,

which is satisfied for any invertible matrix S. Consequently, in this limit θ is
undefined.
Example 50. Consider the rotation Rθ : R2 → R2 by an angle θ ∈ [0, 2π]
given by the matrix

cos θ − sin θ
Rθ = .
sin θ cos θ
We get eigenvalues by solving polynomial equation
p(λ) = (cos θ − λ)2 + sin2 θ = λ2 − 2λ cos θ + 1 = 0
where we have used cos2 θ + sin2 θ = 1.
p p
λ = cos θ ± cos2 θ − 1 = cos θ ± − sin2 θ
= cos θ ± i sin θ = e±iθ .
Thus we see that Rθhas eigenvalues
when θ = 0 or θ = π. However if we
x1 2
interpret the vector ∈ R as a complex number z = x1 + ix2 , z is
x2
an eigenvector of Rθ : C → θ maps z → λz = e±iθ z, C is the set of complex
numbers. Multiplication by e±iθ corresponds to rotation by the angle ±θ, see
Chapter 8.
1.7 Miscellaneous Applications

In this section we discuss the relevance of matrices in digital image process-
ing image compression, cryptography, transform coding, information theory
and literature.
1.7.1 Digital Image Processing

Photographs taken by mobile cameras and illustrations on internet pages
are typical examples of digital images. Such type of images can be represented
by matrices. For example, the small image of “Felix the cat” cartoon character
can be represented by a 35 × 35 matrix whose elements are the numbers 0 and
1 that specify the color of each pixel: a 0 indicates black and a 1 indicates
white. Digital images using only two colors are called binary images or Boolean
images. A pixel is the smallest graphical element of a matrical image which
can take only one color at a time.
Let an image f (x, y) be represented by an n × n matrix
 
f (1, 1) f (1, 2) · · · f (1, n)
 f (2, 1) f (2, 2) · · · f (2, n) 
(1.26)
 
 .. .. .. .. 
 . . . . 
f (n, 1) f (n, 2) · · · f (n, n)
where f (n, n) = ai,j , i = 1, 2, . . . , n, j = 1, 2, . . . , n. Each entry of this matrix

is called a pixel. The quality of an image will vary along with variations of
i and j. Gray scale and color images are also represented by matrices. For
convenience, most of the current digital files use integer numbers between 0
(to indicate black, the color of minimal intensity) and 255 (to indicate white
maximum intensity) giving a total of 256 = 28 different levels of gray (This
quantity of levels of gray is appropriate for Web page images but certain types
such as medical images need more levels of gray).
Color images are represented by three matrices. Each matrix specifies the
amount of red, green and blue that make up the image. This color system is
known as RGB. There are many other color systems that may be used, de-
pending on applications. The elements of those matrices are integers between 0
and 255 and they determine the intensity of the pixel with respect to the color
of the matrix. Thus in the RGB system, it is possible to represent 2563 = 224
different colors. Once a digital image is represented by matrices, the perti-
nent question arise how operations on their elements affect the corresponding
image.
(i) Let A = (aij ) represent the image in Figure 1.11 and the matrix adjoint
of A, that is AT = aji , represents the image in Figure 1.12.
FIGURE 1.11: Image Representation by Matrix
FIGURE 1.12: Image Representation by Adjoint Matrix
(ii) Let a color image be represented by matrices A = (aij ), B = (bij ) and

AC = (cij ) Then we get a gray scale version of the image (non-integer
values are rounded to the nearest integer) by taking the standard arith-
metic mean of the matrices A, B and
!
A+B+C aij + bij + cij
C d = .
3 3 ij
(iii) The operation of multiplication by a scalar and sum of matrices, allows

us to create an image transition effect commonly used, for instance,
in power point presentation and slide shows. For example M (t) = (1 −
t)A + tB where matrices A nd B represent gray scale images of the same
size. It is clear that M (0) = A and M (1) = Z and, for each t between
0 and 1, the elements of the matrix M (t) are between the elements of
the matrices A and B. Therefore when t varies from 0 to 1, the matrix
varies from A to Z.
Important ingredients of digital image processing include image modeling,
image enhancement, image smoothing, image restoration image compression,
denoising and transmission of information under a secure environment.
1.7.2 Matrices in Digital Image Compression

Figure 1.13 and 1.14 show the sequence involved in digital image compres-
sion. A compression technique which provides error-free compressed images
construct
Input _____. Forward
.
1mage
nxn r--+ transform ~ Quanti- ~ Symbol r--+ Compressed
images zation encoder images
FIGURE 1.13: Transform Coding System for Input Image
Compressed _____. Symbol Inverse Merge nxn Decompressed

~ transform ~ sub images ~
images decoder images
FIGURE 1.14: Transform Coding System for Compressed Image
is called error free compression. Let E = (n, n) denote the error between the
original digitized image A and decompressed image B. Then
n X
X n
E(n, n) = [aij − bij ] .
i=1 j=1
n n
1 XX
The mean absolute = |aij − bij | = ems .
n2 i=1 j=1
n n
1 XX
The mean square = [aij − bij ] .
n2 i=1 j=1
p
The root mean square = Mean square.
The mean square signal to noise ratio of the output image denoted by
SN Rms is defined as
Pn Pn 2
i=1 j=1 (bij )
SN Rms = Pn Pn 2.
i=1 j=1 [aij − bij ]
√
SN Rrms is called the root-mean square signal to noise ratio.
The peak signal-to-noise ration P SN R is defined as

255 × 255
P SN R = 10 log10 .
ems
The digital image processing main task is to find methods which give minimum
value of mean absolute error ems and SN Mrms .
For more details, see Neunzert and Siddiqi [9], Gonzalez and Woods [3], Gomes
and Velho [2] and Lay [7].
1.7.3 Cryptography with Matrices

Cryptography is a combination of the Greek words cry meaning hidden or
secret and grapho, which means writing. Thus cryptography is the study of
secret writings or codes. We discuss here a system of encoding and decoding
messages that requires both sender and the receiver to know specified rules
of correspondence between a set of symbols like letters of the alphabet and
punctuation marks from which messages are composed and a set of integers
forming a specified invertible matrix A.
Problem:
(i) Encode a message into a string of numbers via matrix multiplication.
(ii) Decode the string of numbers into a written message.
Consider the following message :
HI T HERE BU DDY
We shall require the number 0 to represent a space and use the numbers 1
through 26 to represent 26 letters. We permute the digits in some way instead
of making direct substitution. See Table 1.3. We choose an invertible n × n
TABLE 1.3
A B C D E F G H I J K L M
2 1 4 3 6 5 5 7 10 9 12 11 14
N O P Q R S T U V W X Y Z
13 16 15 18 17 20 19 22 21 24 23 26 25
matrix which will be used to encode the message. Let us use

 
3 2 3
A =  −6 −5 −4  .
9 8 7
We first convert the message to numeric script using H = 7, I = 10, blank
= 0, T = 19, H = 7, E = 6, R = 17, E = 6, Blank = 0, B = 1, U = 22, D =
3, Y = 26. This numeric form of message is
7100197617601223326.
Then we convert this script into an m × 3 matrix (3 columns because our A
is 3 × 3). The number of rows m depends on the length of the message. We
fill the last row with more 0’s as necessary:
 
7 10 0
 19 7 6 
 
B=  17 6 0  .

 1 22 3 
3 26 0
Here B is 5 × 3. Now
C = BA
 
7 10 0  

 19 7 6   3 2 3
=  17 6 0   −6 −5 −4 
 
 1 22 3  9 8 7
3 26 0
 
−39 −36 −19

 69 51 71 
= 
 15 4 27 .
 −102 −84 −64 
−147 −124 −95
We convert matrix C into a string that becomes the coded message, see Table
1.4.
It may be observed that H has been sent to two different values. The first H
TABLE 1.4
−39 −36 −19 69 51 71 15 4 27 −102 −84 −64 −147 −124

H I T H E E E B U D D Y
has been encoded as −39, and the second H has been encoded as 51. Likewise
the three blanks assigned different values as are the two E s and two D s.
The person at the other end receives only the coded message:
−39 −36 −19 69 51 71 15 4 27 −102 −84 −64 −147 −124 −95.
The receiver knows the alpha-numeric code and the coding matrix A. Thus,
the coded script is converted back into an m × 3 matrix C and then multiplied
by A−1 to undo the original matrix multiplication. Since C = BA, we get
B = CA−1 , and then we convert the string back to the alphabet.
Example 51. Encode and decode the following message using the given ma-
trix:

1 2
(i) Message: SEND HELP, A = .
1 1
(ii) Message: MADAME X HAS THE PLANS
 
1 2 3
A =  1 1 2 .
0 1 2
A natural correspondence between the first 27 non-negative integers and the

letters of the alphabet and blank spaces (to separate words) is given by Table
1.5.
TABLE 1.5
0 1 2 3 4 5 6 7 8 9 10 11 12 13
space a b c d e f g h i j k l m
0 14 15 16 17 18 19 20 21 22 23 24 25 26
space n o p q r s t u v w x y z

19 5 14 4 0
Solution: (i) B =
8 5 12 16 0
C = AB

1 2 19 5 14 4 0
=
1 1 8 5 12 16 0

35 15 38 36 0
= .
27 10 26 20 0
C corresponds to the encoded message.

−1
B = A
C
−1 2 35 15 38 36 0
=
1 −1 27 10 26 20 0

19 5 14 4 0
=
8 5 12 16 0
represents
the decoded message. 
7 15 0 14 15 18 20
(ii) B =  8 0 15 14 0 13 1  .
9 14 0 19 20 0 0
C = AB
  
1 2 3 7 15 0 14 15 18 20
=  1 1 2  8 0 15 14 0 13 1 
0 1 2 9 14 0 19 20 0 0
 
50 57 30 99 75 44 22
=  33 43 15 66 55 31 21  .
26 28 15 52 4 13 1
C represents the encoded message:

−1
B = A
 C  
0 1 −1 50 57 30 99 75 44 22
=  2 −2 −1   33 43 15 66 55 31 21 
−1 1 1 26 28 15 52 4 13 1
 
7 15 0 14 15 18 20
=  8 0 15 14 0 13 1 
9 14 0 19 20 0 0
= decoded message.
1.7.4 Transform Coding

Let a transformation A be defined with the help of an n × n matrix as
follows:
y = Ax (1.27)
where
   
x1 y1
 x2   y2 
x= ,y = 
   
.. .. 
 .   . 
xn yn
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
A= .
 
.. .. .. ..
 . . . . 
an1 an2 ··· ann
In general, A is not invertible. In this case, the vector of pixels x is transformed

into a vector of coefficients y. For some set of vectors, transformations of
fewer bits are required to encode the n coefficients of y than the n pixels
of x. In particular, if the elements x1 , x2 , . . . , xn are highly correlated and
the transformation matrix A is chosen such that the coefficients y1 , y2 , . . . , yn
are less correlated than the yi ’s and can be individually coded for efficiently,
explained below, if we choose A as
 
1 0 0 0 0 0
 1 −1 0 0 0 0 
 
 0 1 −1 0 0 0 
A=   (1.28)
 0 0 1 −1 0 0 

 0 0 0 1 −1 0 
0 0 0 0 1 −1
in equation (1.27). The first element of y is y1 = x1 and all subsequent coef-

ficients are given by yi = xi−1 − xi . If the grey levels of the adjacent pixels
are similar, then the difference yi = xi−1 − xi will, on the average, be smaller
than the grey level that should require fewer bits to code them. This mapping
is invertible.
If A is a unitary matrix, then A−1 exists and A−1 = At , whereas if At is the
transpose of A, then (1.27) can be written as
x = At y. (1.29)
It is clear from (1.27) that each coefficient yk is a linear combination of n

pixels; that is,
yk = Σni=1 aki xi , (1.30)
for k = 1, 2, 3, . . . , n.
Similarly, by Equation (1.29), each pixel xi is a linear combination of all the
pixels
xi = Σnk=1 bik yk , (1.31)
for i = 1, 2, 3, . . . , n.
Equations (1.30) and (1.31) are similar to the expressions defining the forward
and the inverse transformation kernels, respectively, whereas aki is the forward
formation kernel and bik is the inverse transformation kernel.
For the two-dimensional case, (1.30) and (1.31) take the forms
ykl = Σni=1 Σnj=1 xij aijkl (1.32)
and
xij = Σnk=1 Σnl=1 ykl bijkl . (1.33)
Here, aijkl and bijkl are forward and inverse transformation kernels, respec-
tively.
In equations (1.30) and (1.31) the Fourier, Walsh and Hadmard transforms
are commonly used for encoding in this setting and they produce fairly good
results. For example, the Fourier kernel is given by
1 −j2π(ik+jl)/N
aijkl = e .
N
Further interpretation of (1.33) is possible. Let us write (1.33) in the form
X = Σnk=1 Σnl=1 ykl Bkl . (1.34)
Interpret this as a series expansion of the n × n subimage X into an n2 n × n

basis gives
 
bkl11 bkl12 · · · bkl1n
 bkl21 bkl22 · · · bkl2n 
 
Bkl =   ··· ··· ··· ···  (1.35)
 ··· ··· ··· ··· 
bkln1 bkln2 · · · bklmn
where the ykl for k, l = 1, 2, . . . , n represents the coefficient (weights) of the

expansion. Therefore (1.35) gives the image X as a weighted sum of the basis
Bkl . The coefficients of the expansion are given by (1.32) which may be written
in the form
y = Akl X, (1.36)
where Akl is formed in the same manner as Bkl except that the forward kernel
is used. Let us consider the number of all possible values of the coefficients in
the Equation (1.27) where
yi = Σnj=1 aij xj .
Each element xj can have any of 2m different values, each aij xj term can also
have any of 2m different values and the sum of n such terms could have any
of 2mn different values. Therefore, a natural binary representation would give
mn bit words to assign a unique word to each of the possible 2mn values of yi .
Since only m bit words would be required to code any xj , and our objective is
to use fewer bits to code yi , we must round off yi to a fewer number of allowed
levels.
A quantizer is a device whose output can have only a limited number of
possible values. Each input is forced to one of the permissible output values.
For more details, we refer to Gonzales and Woods [3].
1.7.5 Markov Matrix and Markov Process

Definition 27. A real n × n matrix A = (aij ), i = 1, 2, . . . , n j = 1, 2, . . . , n
is called a Markov matrix or row stochastic matrix if
(i) aij ≥ 0 for 0 ≤ i, j ≤ n
(ii) Σnj=1 aij = 1 for 1 ≤ i ≤ n.

A Markov matrix is also known a transition matrix or probability
matrix. In other words an n × n matrix is Markov if all entries are non-
negative and the sum of each column vector is equal to 1.
This matrix is named in honour of Andrei Andreyevich Markov (1856-1922)
who was born in Ryazan, Russia, and died in St. Petersburg, Russia. He also
discovered a process called Markov chain or Markov process of which the
Markov matrix is an important ingredient. The five greatest applications of
Markov chains are presented by Phillopp Von Hilgers and Amy N. Langvill
[10] and listed in chronological order
(i) AA. Markov’s application to Eugene Onegin

(ii) C.E. Shannon’s application to information theory
(iii) Baum’s application to hidden Markov models
(iv) Scherr’s application to computer performance evaluation
(v) Brin and Page’s application to Web searches

In order of importance the list is:
1. Scherr’s application to computer performance evaluation
2. Brin and Page’s application to Web searches
3. Baum’s application to hidden Markov models

4. Shannon’s application to information theory
5. Markov’s application to Eugene Onegin
Definition 28. Let there be a physical and mathematical system that has
n possible states. At any one time, the system is in one and only one of its
n states. Also assume that at a given observation period, say the kth period,
the probability of the system being in a particular state depends only on its
status at k − 1th period. Such a system is called Markov Chain or Markov
process.
In this process the probability of the system being in a particular state at

a given observation period depends only on its state during the immediately
preceding observation period.
Suppose that the system has n possible sates. For each i = 1, 2, . . . , n and
j = 1, 2, . . . , n, let pij be the probability that if the system is in state j at a
certain observation period, it will be in state i at the next observation period.
pij is called transition probability. Since pij is a probability, we must have
0 ≤ pij ≤ 1, 1 ≤ i, j ≤ n.
Also, if the system is in state j at a certain observation period, it must be

in one of the n states (it may remain in state j) at the next observation
period.Thus, we have
p1j + p2j + p3j + · · · + pnj = 1.
It is clear that pij is a Markov matrix.
Definition 29. A Markov matrix is called regular if, for some integer r, all
entries of T r are positive. A Markov chain or process is called regular if its
Markov matrix is regular.
Important properties of Markov matrices and chains are summarized below:
(i) If T is a regular Markov matrix, then as n approaches infinity, T n → S,
where S is a matrix of the form (v1 , v2 , . . . , vn ) with v being a constant
vector.
(ii) If T is a regular Markov matrix of a Markov chain or process, and if X

is a state vector, then as n approaches infinity, T n X → p, where p is a
fixed probability vector (the sum of its entries 1), all of whose entries
are positive.
(iii) Let T be a regular Markov matrix of a Markov chain or process and let
limn→∞ T n = S. then limn→∞ T n X = SX = p, p is called the steady
state vector of the system.
(iv) S = T S. We know that T n+1 = T T n , limn→∞ T n+1 = S and

limn→∞ T n = S. This implies that S = T S.
(v) We observe that T p = p for any column p of T . Therefore, the steady
state vector of a regular Markov chain with Markov matrix T is the
unique probability vector p satisfying T p = p.
(vi) A steady-state vector of a regular Markov chain is an eigenvector for the

Markov matrix corresponding to the eigenvalue 1.
Example 52. An unofficial study of the weather in a city of Germany in
early spring yields the following observations;
(i) It is almost impossible to have two nice days in a row.

(ii) If we have a nice day, we are just as likely to have snow or rain the next
day.
(iii) If we have snow or rain, we have an even chance to have the same weather
the next day?
(iv) If there is a change from snow or rain only half of the time, is this a
change to a nice day?
(a) Write the Markov matrix to model this system.

(b) If today is nice, what is the probability of nice weather after one week?
(c) Find the long term behavior of the weather.
Solution: Since the weather tomorrow depends only on today, prediction is

a Markov chain (process). The Markov or transition matrix of this system is
N R S
N 0 0.25 0.25
!
R 0.5 0.5 0.25
S 0.5 0.25 0.5
where the letters N, R, and S represent nice, rain, snow respectively.

If today is nice, the initial state vector is
 
1
X0 =  0  .
0
After 7 days (one week), the state vector would be
X7 = T 7 X0
  
0.1999511 0.2000122 0.2000122 1
=  0.4000244 0.4000244 0.3999633   0 
0.4000244 0.3999633 0.4000244 0
 
0.1999511
=  0.4000244 
0.4000244
so, there is about 20% chance of nice weather in one week. We observe that our
Markov matrix is regular so existence of the steady state vector is guaranteed.
To find the steady state vector we solve the homogeneous system (T −I)X = 0
which has the following coefficient matrix:
 
−1 0.25 0.25
 0.5 −0.5 0.5  .
0.5 0.5 0.5
Its reduced echelon form is
 
1 0 −0.5
 0 1 −1  .
0 0 0
The general solution of this system is
 
0.5t
 t  , t ∈ R.
t
Since 0.5 t + t + t = 1, we ge t = 0.4. Thus the steady state vector is
 
0.2
p =  0.4  .
0.4
In the long term, there is 20% chance of having a nice day, 40% chance of a
rainy day and 40% chances of a snowy day.
Example 53. Suppose weather in a city in India is either dry or rainy. Mete-
orological data shows that the probability of a rainy day following a dry day
1 1
is and probability of a rainy day following a rainy day is .
3 2
Solution: Let state D be a dry day and state R be rainy day. Then the
Markov matrix of this Markov process is
D R
2 1
D
(pij ) = 3 2 .
1 1
R
3 2
Since all entries in pij are positive, the system is a regular Markov chain.
Let us begin our observations (day 0). The weather is dry so the initial state
vector is

1
X0 = ,
0
a probability vector. The the state vector on day 1 (the day after we begin
our observations) is

0.67 0.5 1 0.67
X1 = T X0 = =
0.33 0.5 0 0.33
2 1
where is written as 0.67 and as 0.33 respectively for the sake of conve-
3 3
nience. The probability of no rain on day 1 is 0.67, and the probability of rain
on that day is 0.33. Similarly

0.67 0.5 0.67 0.614
X2 = T X1 = = ,
0.33 0.5 0.33 0.386

0.67 0.5 0.614 0.604
X3 = T X2 = = ,
0.33 0.5 0.386 0.396

0.67 0.5 0.604 0.603
X4 = T X3 = = .
0.33 0.5 0.396 0.397

0.603
From the fourth day onward the state vector remains the same: .
0.397
This is the steady state vector which means that from fourth day onward, it
is dry about 60% of the time, and it rains about 40% of the time.
Additional Properties of Markov Matrices

(a) The product of two Markov matrices is a Markov matrix
(b) Every eigenvalue λ of a Markov matrix satisfies |λ| ≤ 1, that is, absolute
value of an eigenvalue of a Markov matrix is less than 1.
(c) Absolute value of every eigenvalue of a positive Markov matrix (having
only positive elements) is 1.
(d) A Markov
matrix
may have zero eigenvalues and determinant :
0 0
A= .
1 1

1 0
(e) A Markov matrix can have several eigenvalues 1: A = .
0 1
1.8 Introduction to MATLAB® and MATHEMATICA

In this section we introduce two computer softwares, namely MATLAB and
MATHEMATICA. MATLAB stands for MATRIX LABoratory developed by
Mathworks Corporation, Natick, MA (http://www.mathwork.com). Its inven-
tion is credited to Clever Moler in early 1970. MATLAB is a high performance
language for technical computing. It integrates computation, visualization and
programming in an easy-to-use environment where problems and solutions are
expressed in familiar mathematical notions.
All themes discussed in previous sections can carried out easily by MAT-
LAB. It is the favorite software of mathematicians and engineers while MATH-
EMATICA is more attractive for physicists. MATHEMATICA is a powerful
programming language invented by Wolfram Research, Inc with its formal
announcement on 23 June, 1988 (www.stephenwolfram.com/about.). It is a
contemporary software of MATLAB and performs better in many situations.
Stephen Wolfram is the creator of Mathematica.
1.8.1 MATLAB for Matrices

MATLAB has a wide range of capabilities. We introduce here few of its
features. We refer to Chapter 9 and Chapter 10 of Kolman and Hill [5] for
detailed discussion on MATLAB for linear algebra.
Matrix Input: To enter a matrix into MATLAB, first type the entries of the
matrix within square brackets, with space between entries and commands.
Use semicolons to indicate end of rows. The example below
 
9 −8 7
 −6 5 −4 
11 −12 0
is entered by typing [9 − 8 7; −6 5 0.4; 11 − 12 0] and the accompanying

display is
9 −8 7
ans = −6 5 −4
11 −12 0
No brackets are displayed and MATLAB has assigned this matrix the name
ans. Every matrix in MATLAB must have a name. If name is not assigned,
then MATLAB assigns ans, which is known as a default variable. To assign a
matrix name, we use the assignment operator, for example
A = (4 5 8; 0 − 1 6)
is displayed as
4 5 8
B=
0 −1 6
Important Elements of MATLAB

MATLAB has many advantages as compared with conventional computer lan-
guages or programmes for solving technical problems which can be summa-
rized as follows:
1. Easy to Use: MATLAB is an interpreted language that is very easy to
use. It has built-in functions that are optimized for specific problems and
thus makes it easy to evaluate certain mathematical expressions using
the values in their functions. Complex mathematical problems can be
solved easily with a few lines of MATLAB code while other languages
may require several hundred lines of code.
2. Platform Independence: The MATLAB software is supported by var-
ious operating systems such as Win XP/Vista, UNIX and MAC OS X.
The program written in one platform will run on other platforms.
3. Powerful Graphics: This is the one of the important features of MAT-
LAB for technical data analysis and interpretation. MATLAB supports
a wide variety of data so that it can be interpreted well. It supports
colored 2D and 3D plots and animation videos that make it unique.
4. Graphical User Interface: MATLAB includes tools that allow a pro-
grammer to develop a graphical user interface (GUI) for this program.
A programmer can design sophisticated data analysis programs that can
be operated by inexperienced users.
5. MATLAB Help: The help system is powerful and user friendly. It pro-
vides effective help that addresses most topics and commands. Users can
obtain help offline and online through the Internet. MATLAB provides
hundreds of example demonstrations for solving various problems users
may encounter when writing programs.
6. MATLAB Compiler: Flexibility and platform independence are
achieved by compiling MATLAB programs into a device-independent
p code and p instructions for use at run time. However, these steps
cause slow execution of programs and a MATLAB compiler is available
to compile program executable (exe) files that run faster.
How to Run MATLAB?
Let us start with the MATLAB environment. We can open the window by
double clicking the MATLAB icon on the desktop. The main window (1.8.1)
that opens will be divided into three parts: current directory, command his-
tory, and command window.
The workspace shows the variables in use and their maximum and mini-
mum values. The functions of the current directory and command history are
self-explanatory. The command window is the function that allows a user to
write MATLAB commands.
MATLAB File Formats:
MATLAB can read and write various types of files, However, there are five
main types of files for storing data and/or programs that one will use fre-
quently. They are:
(a) M Files: These are standard ASCII text files with .m extensions written
in MATLAB editor. Our main program files are written in M formats.
We will later show an example of an M file.
(b) MAT Files: These are binary files with .mat extensions created when
we variables of MATLAB workspace.
(c) Fig Files: These are binary files with .f ig extensions used to store
graphics (i.e., curves obtained by plotting data).
(d) P Files: These are M files with .p extensions. They are used mainly for
distribution of MATLAB programs with hidden MATLAB codes.
(e) MEX Files:These are callable programs with .mex extensions and are
used to interface MATLAB with C and FORTRAN.
Starting MATLAB:
Variables, Operators and Matrices
Let us start with simple commands related to matrices. To enter a matrix,
simply enter the following when the command prompt appears in the com-
mand window: >> a = [1 2 3; 5 7 4; 9 8 6], then press Enter. The following
will appear in the command window: >> a =
1 2 3
5 7 4 .
9 8 6
It shows that, our 3 × 3 matrix is stored in variable a. We can see that a

semicolon is used to separate the rows. Now let us do some operations on this
matrix. Simply type a− on the command prompt (>> −a) and press enter.
We get
 
1 5 9
ans =  2 7 8  .
3 4 6
Obviously, it is a transpose of matrix a. Here ans is the default variable

provided by MATLAB if we don’t specify our own variable. Similarly, we can
perform other operations. Students can try following commands on command
prompt
1. >> Inv(a) [Will give the inverse of matrix a]
2. >> det(a) [Determinant of matrix a]
3. >> eig(a) [Eigenvalue and eigenvector of a]
4. >> diag(a) [Shows elements of main diagonal]
5. >> rank(a) [Finds rank of matrix a]
Now let us do some operations on two matrices such as summation, subtraction
or multiplication. Matrix a is already stored in memory, so we enter another
matrix b: >> b = [2 4 5; 1 6 8; 3 4 6]
Notice that we have put one semicolon at the end of matrix b. The function of
the semicolon is to suppress output in the workspace window, which becomes
useful when we generate very large matrices.
Students are advised to try following operations on matrices a and b.
1. >> c = a + b [Summation of matrices a and b; result is stored in c]
2. >> c = a − b [Subtraction of matrix b from a]
3. >> c = a ∗ b [Multiplication of matrices a and b]
4. >> c = a. ∗ b [Element-by-element multiplication of matrices a and b]
5. >> c = a./b [Element-by-element division of matrices a and b]
Another operator called left division (\) is used in solving sets of linear alge-
braic equations (see MATLAB help for detail).
MATLAB offers relational and logical operators also. Please use MATLAB
help for this.
Writing Script Files (M − files):

In the previous section, we have seen some basic operations performed by
writing commands on command prompts in command windows. It was easy
to perform single and easy operations on a variable. When we have to perform
several complicated and interdependent operations, then it is not possible to
use a command prompt. MATLAB provides given an editor in which we can

write a program called Script which can be compiled and run and results are
displayed in the command window or figure window, if graphs are requested.
Here we are giving one very simple example of writing an M file to have a set
of linear algebraic equations. The equations are given below:
x + 2y − z = 10.
4x + 6y + z = 20.
x − 8y + 3z = 8.
These three equations can be written in matrix form as [A].[x] = [B]

    
1 2 −1 x 10
 4 6 1  y  =  20  .
1 −8 3 z 8
The solution can be obtained by [x] = [A]−1 [B]. (In MATLAB syntax it is
x = inv(A) ∗ B.)
The solution can be found by writing an M file in the editor and executing it.
To open editor window, go to the File menu of MATLAB window and select
New > M File and write following code.
% Solution of linear algebraic equations.
A = [12 − 1; 461; 1 − 83];

B = [10208];
xyz = inv(A) ∗ B;
xyz.
After executing it, we get following result

 
8.6667
xyz =  −1.6667  .
−4.6667
The first line which starts with % is a comment line and it is not executed.
Introduction to MATHEMATICA
Advantages and Disadvantages
• Powerful
• Easy to learn and use
• Powerful symbolic capabilities
• Coherent and unified
• Symbolic and uniform representation
• Numerical routines for complex problems
• Lots of add-on packages and books

• Scripted dynamic language
• easy to misuse
Arithmetic Operations
Basic Addition, Multiplication etc.
Commands for expressions which should be evaluated are entered into the
input cells and displayed in bold.
Example 54. (Figures 1.15 though 1.17) Evaluate −5 + 3.(−2) +

2−3 + 3.(5 − 7)
.
2 3
− + .(−9 + 3)
3 4
- 5+3• ( - 2) + (2- 3 +3• (S - 7))/ ( -2 /3+ (3/ 4 ) • ( - 9+3))
1223
124
FIGURE 1.15
If we want a numerical value of the last expression with, say 20 digits, then
in the cell we can type
N [% , 20]
- 9 . 8629032258064516129
FIGURE 1.16
%2
-9 . 8629032258064516 129
FIGURE 1.17
Algebraic Calculations
Mathematica can perform algebraic calculations too using several functions.
Expand[(x-1) ( 2x-3) (x+9)]
27 - 4 2 x + 1 3 x 2 + 2 x 3
FIGURE 1.18
Example 55. (Figure 1.18) Expand (x − 1)(2x − 3)(x + 9).
Example 56. (Figure 1.19) Factorize [x3 + 2x2 − 5x − 6].
Factor [x ' 3+2x' 2- Sx- 6]
( - 2 + x ) ( l + x ) ( 3+ x )
FIGURE 1.19
Example 57. (Figure 1.20 though Figure 1.24) To simplify [(x2 − 3x)(6x −
7) − (3x2 − 2x − 1)(2x − 3)].
Simplify[ (x ' 2-3x) (6x - 7) - ( 3x' 2 - 2x-1) (2x - 3)]
- 3 + 17x - 1 2x2
FIGURE 1.20
Togethe r [ 2 x - 3 + ( 3 x - 5 ) / ( x 2 + x + 1 ) )
FIGURE 1.21
Apart[%8]
- 5+3x
- 3 + 2X + - - - -
l + x + x2
FIGURE 1.22
Commands on the same line within a cell must be separated by semicolons.

The output of any command that ends with a semicolon is not displayed.
Ix = 5 I 4 - 3 I 7; y = 9 I 5 + 4 I 9 - 9 x; z = 2 x' - 3 y '
FIGURE 1.23
If we have to evaluate an expression with different values each time, the

substitution command symbolized with a slanted bar and a period is useful.
Using Functions
z = 2x ..... 2 - 3xy+2y ... 2 /. {x~ 1, y-+ 1}
FIGURE 1.24
Built-in Functions
All built-in Mathematical functions and constants have full names that begin
with a capital letter. The arguments of a function are enclosed by brackets.
• For example the familiar functions from calculus sin x, cos x, ex , ln x in

Mathematica are sin[x], cos[x],Exp[x], Log[x].
• The constant π in mathematica is P i, while the Euler number e in
Mathematica is E.
Using Functions
User Defined Functions
In Mathematica the function f (x) = x2 + 2x is defined by:
We can evaluate f (a + b) + f (a − b).
f[x_ ] : = x·2+2x;
FIGURE 1.25
Similarly, for defining functions of several variables: g(x, t) = sin(πx) cos(2πt)
f[a + b] + f[a-b]
FIGURE 1.26
User defined functions

Define a piecewise function
I g[x_ , t _ ] : = Sin[Pi x] Cos[2 Pit];
FIGURE 1.27

 x if − π ≤ x < 0
h(x) = x2 0≤x<π
0 elsewhere

Graphics
h[x_ ] : = Piecewise[((x, -Pi<x<O), {x•2, O<x<Pi)), {0});

h[2]
FIGURE 1.28
Basic Plotting
Mathematica is exceptionally good at creating two− and three−dimensional
graphs.
• The general syntax is Plot[f, {x, xmin , xmax }].
• The plot of the function h(x) above on the interval [−π, π] is obtained
by typing the following (Figure 1.28 through Figure 1.30:
Plot[h[x] , (x, -Pi, Pi}]
10
-3 -2
-2
FIGURE 1.29
I Plot [Sin[x], (x, - 2 Pi, 2 Pi ) ]
1.0
-1.0
FIGURE 1.30
Multiple Plotting
MATHEMATICA can handle multiple plots.
• Syntax: Plot[{f 1 f 2}, {x, xmin , xmax }].
Plot [ {Sqrt [x], x, x 2 } , ( x, 0, 2 } ]
FIGURE 1.31
Plotting Graphs
Plot a dotted curve on the plane joining the points (Figure 1.31 and 1.32):
Plot both graphs together(Figure 1.33).
Plot p1 next to p2 (Figure 1.34).
Plot a Network/Graph Plot (Figure 1.35).
list= Table[ (x, Sin[x]), (x, -2 Pi, 2 Pi, 0.1) ];

pl = ListPlot[list]
·-
.......... 10
. ·.
:
0.5
-6 -4 -2
-0.5
·-
.·.....,.... ·.
-10
FIGURE 1.32
p2 = ListLinePlot[list]
10
-10
FIGURE 1.33
Plot function of one variable (Figure 1.36).

Plotting exponential graphs on log scaled vertical axis (Figure 1.37 and 1.38).
Plotting on log-scaled vertical and horizontal axis (Figure 1.39). Plot Para-
metric Equations (Figure 1.40). Plot function of two or more variables (Figure
1.41).
Plotting sin(x) sin2 (y) (Figure 1.42).
Show[p1, p2]
FIGURE 1.34
Show[GraphicsGrid[{{p1, p2 }}]]
LO 1\
if\..
. . 0.5! \
: !
-6 -4 ·~ -2 •
~ -oi
\_{o
FIGURE 1.35
GraphPlot [ { 1 -+ 2, 1 -+ 4, 2 -+ 4, 3 -+ 4, 3 -+ 2, 3 -+ 5, 5 -+ 1} , VertexLabeling -+ True]
______---; !
2
FIGURE 1.36
I LogPlot [B"' , {x, l, 5} ]
lo'
10'
10'
100
FIGURE 1.37
2
LogLinearPlot [ E" , {x, l, 5) ]
35x 10'
30xl0'
2.5xl0'
2.0x 10'
1.5x 10'
LOx 10'
500000
1.5 20 30
) 50
FIGURE 1.38
LogLogPlot [E•' , {x, l , 5 } ]
1010
lo'
10'
10'
100
2.0 30 50
FIGURE 1.39
ParametricPlot[{Sin[t], Cos[t]), {t , 0, 2Pi}]
0.5
-0.5 0.5
-0.5
FIGURE 1.40
Plot3D(Sin[x] Sin[y] 2 , {x, 0, Pi), {y, 0, Pi})
FIGURE 1.41
Plot Data
Plotting data by making table and customizing plot (Figure 1.43).
Fun with Plots
edata .. Table[Sin[2 i], (i, 0, 2 Pi, Pi /12}]:

LietPlot (•data, Filling-+ Axi•, PlotStyle-+ (PointSize (Medium] , Red}];
cdata • Table(Co•[i /2 ] , (i, 0, 2 Pi, Pi /12}];
LietLinePlot [ ( cdata, edata}]
Length(cdata]
FIGURE 1.42
Manipulate[Plot3D{Sin{a•x/t]Co• [a•y/t], (x, 0, 2Pi}. (y, 0, 2Pi}].

{a, 0, 20), {t, 1, 100))
FIGURE 1.43
testdata = Table [ (x, Sqrt[x]), (x, 0, 4, 0.2)];

Manipulate{ListPlot(testdata, PlotMarkers ~ (Automatic. Size)],
{Size, {Tiny, Small, Medium, Large}}, SaveDefinitions-+ True)
S1ze Tiny I Small II Medium Large I

~. 0
•••
• ••
1.5
• • •••
• ••
•
••
1.0
•
0.5
•
~
1 2 3
~
FIGURE 1.44
Differentiation Operations
Single Variable Differentiation
nth derivative of a function f (x): D[f (x), {x, n}] (Figure 1.44).
d
Example 58. Find [sin(1 + 4x2 )] (Figure 1.45).
dx
D[ Sin[l+4x2 ) , x )
FIGURE 1.45
d2
Example 59. Find [sin(1 + 4x2 )] (Figure 1.46).
dx2
Partial Differentiation
d3
Example 60. Find [x3 y 2 + x sin[y 2 ]] (Figure 1.47).
dxdy 2
Integration Operations
Single Variable Integration
I o[sin[l + 4x2 ] . (x, 2}j
I 8 Cos [ 1 + 4 x
2
] -
2
64 x S i n [ 1 + 4 x 2 ]
FIGURE 1.46
2 2
6 x + 2 Cos (¥" ] - 4 y Sin[¥"]
FIGURE 1.47
Indefinite integral: Integrate[f [x], x] (Figure 1.48).

Definite integral: Integrate [f [x], {x, a, b}] (Figure 1.49).
R
Example 61. Evaluate sin(1 + 4x)dx.
Integrate[Sin[l + 4 x], x]
1
- - Cos [ 1 + 4 x ]
4
FIGURE 1.48
Rπ
Example 62. Evaluate x=0
sin(1 + 4x)dx.
Integrate[Sin[l+4x], (x, 0, Pi} ]
FIGURE 1.49
Multivariable Integration
Integrate [f [x, y], {x, a, b}, {y, c, d}]
RR 2
Example 63. Evaluate (x sin y + xy)dxdy (Figure 1.50 and 1.51)
Rπ R2
Example 64. Evaluate y=0 x=1
(x2 sin y + xy)dxdy.
Solving Single and Quadratic Equations
Example 65. Solve 2x2 − 3x + 1 = 0 for x (Figure 1.52).

I Integrate[x2 Sin[y] +X •y, x, y)
1 2
1 x (3 y2 - 4 xCos[y] )
12
FIGURE 1.50
Integrate[x2 Sin[y] +x•y, {x, 1, 2), {y, 0, Pi))
1 ~ .~
3 4
FIGURE 1.51
I•·>-{:··-,..,_, •I
{{x -. 2 }' {x -> 11}
FIGURE 1.52
[ So1ve[x'+x + 1o.O,xj [ [1] ]
[ 2
{x ->- 3 (- 9 + ..J93)
l
l /3
+
(2' ( ·r;;;;;-))"'
- 9 + v 93
32/3 }
FIGURE 1.53
cubic
Example 66. cubic (Solve x3 + x + 1 = 0 for x (Figure 1.53).
Example 67. Solve a linear system for x and y.
ax − 2y = −1
bx − y = 1
I Solve[a x- 2 y .. -1 b x- y •• 1, {x, y)]

&&
I {{x ->- - 3 , y ->-- a+b }}

a - 2b a - 2b
FIGURE 1.54
Solving Single Equations Numerically

NSolve[expr,vars] − solving numerically (Figure 1.54)
NSolve[expr,vars,Reals] − attempts to find numerical approximations over the
real number domain (Figure 1.55).
Example 68. Solve x5 − 2x + 1 = 0 for x over the real number domain. Find
Root f (x), {x, x0 }]− numerical root of f (x) starting from x0 (Figure 1.55
and 1.56).
NSolve[x 5 -2x+1=0, x, Reala )
{{ x ->- 1.29065 ), {x ->0 . 518 7 9 }. {x -> 1. } }
FIGURE 1.55
Example 69. Find all roots of f (x) = x − sin x near x = π.
FindRoot [x- Sin [x ] , (x, Pi) ]
{x -> 2 .8 0119 x l 0- 8 }
FIGURE 1.56
Example 70. Find solution of ex = −x near x = 0.
FindRoot [Exp[x] = -x, (x, 0 } ]
{x -> -0 . 5671 4 3}
FIGURE 1.57
Differential Equations
DSolve[eqn,y,{x1,x2..}] solves equations for the function y = y(x1, x2..) (See
Figures 1.58 through 1.60)
Example 71. Find general solution of y”(x)+y 0 (x) = ex subject to boundary

conditions y”(x) + y 0 (x) = ex at y(0) = 0, y 0 (0) = 1.
Example 72. Solve: 2Zxx (x, t) − 2Ztt (x, t) = 1.

Generating Matrices
Defining a Matrix (Figure 1.61).
Each individual list is the row of a matrix (Figure 1.62).
DSolve[D[y[x], {x, 2)] +y[x] o• Exp[x], y[x], x]
l
{{y[x ) ->C[ 1] Cos[x ) + C[2] Sin[x ] •2 e• (cos[x ) 2 +Sin [ xJ 2 ) } }
FIGURE 1.58
DSolve[ {D[y [ x ], {x, 1)] + y[x] = Exp[x], y[O] = 0, y' [O J = 1), y, x ]
FIGURE 1.59
DSolve[3D[Z[x, t], ( x, 2)] -2D[ Z[x, t], {t, 2)] oo 1. Z, (x , t)]
{{z_, Function[ {x , x'6

t } . - +C [1 ] [t - Jf -
3
x ] + C[2] [ t + Jf-
3
x ]]}}
FIGURE 1.60
A = {(1, 0, 3, 4 ), (3, 2, 0, 2 ), (1 , 1, 1, 1))
{{1, 0, 3, 4}. {3, 2, 0, 2}. { 1, 1, 1, 1 ))
FIGURE 1.61
A= {{1. 0, 3, 4), (3, 2, 0 , 2), {1. 1, 1, 1))
{{1, 0, 3, 4}. {3, 2, 0, 2}. { 1, 1, 1, 1 ))
FIGURE 1.62
B= MatrixForm [ Table [1/ (i+j), {i. 2), (j , 3)]]
FIGURE 1.63
Converting Data into Matrix (Figure 1.63 and 1.64).

Perform required addition and multiplication operations.
X = ({1 , 3, 4), (3, 0 , 2), (1, 1, 1}}; Y = ((2, 1, 4 ) , (1, 1, 3), (1 , 2, 1}};
X+Y
{{3, 4, 8), {4, 1, 5). {2, 3, 2) )
FIGURE 1.64
X.Y
{{9, 12, 17 ) , {B, 7, 14 ). {4, 4, B))
FIGURE 1.65
Matrix Customization
Determinant of matrix (Figure 1.65)
Inverse of a Matrix (Figure 1.66)
A = ((1, 0, 3), (3, 2, 2) , (1, 1, 2}}
{{ 1, 0, 3). {3, 2, 2). { 1, 1, 2 ))
Det[A]
FIGURE 1.66
Transpose of a Matrix (Figure 1.67)
Inverse[A]
236 417112
{{-.
555
-, --}. {--,
555
- -, - }. {-. - -, - }}
555
FIGURE 1.67
Solving Linear Systems using Inverse Matrices
Example 73. Solve 3x1 −2x2 +x3 = −1 2x1 +x2 −x3 = 0 3x1 −2x2 −3x3 = 5
(Figure 1.68).
I '·-~··lA]
{{1. 3, 1 } , {0, 2, 1}. {3, 2, 2 }}
FIGURE 1.68
A = {(3, -2,1), {2,1, -1), (3, -2, -3}}; b = {{-1), {0), {5));
MatrixForm[b]
FIGURE 1.69
Alternative Solution: Computing Fourier Series with MATHEMAT-

ICA
Refer to MATHEMATICA book titled. Let f be a function which on the
B = Inverse [A]
Solution = B. b
FIGURE 1.70
I LinearSolve[A, b)
I {{-14
5
}. {-14}•
11
{-;3 }}
FIGURE 1.71
interval [−π, π] is given by f (x) = |x|.

Using MATHEMATICA solve the following problems
(a) Plot the function f on the interval [−3π, 3π].
(b) Find the Fourier coefficients of f .
(c) Plot several Fourier partial sums SN f .
Solution: (a) Define the function.

Replicate function over several periods:
ab [x_ ] : = Piecewise [ { {Abs [x], Abs [x] < Pi}}]

Plot[ab[x], {x, - 3Pi, 3Pi), PlotRange-+All]
-5
FIGURE 1.72
periodicextension = Sum[ab[x + 2 k Pi], (k, -4 , 4}];

Plot [ periodicextension, { x, - 3 Pi, 3 Pi}, PlotRange ~All ]
-5
FIGURE 1.73
Solutions (b) and (c)

Fourier basis and inner product
Fourier Coefficients
Calculating nth partial sums
Plot of first and third partial sums of Fourier series of f , along with plot of
function f .
Working with spreadsheets
s[n_ , x _ ] := Sin[nx] ;
c[n_ , x _ ] : = Cos[nx];
IP[f_ , g_ ) : = 1/Pi!ntegrate[f•g, (x, -Pi, Pi}];
FIGURE 1.74
aO [ func_ ] := IP[func[x), c[O, x ));

aFC[func_ , n_ ] : = IP[func[x] , c[n, x] ] ;
bFC[func, n _ ] : = IP[func[x] , s[n, x)];
FIGURE 1.75
fourierSeries(func_ , N_ , x _ ] : =
aO[func] /2+Sum[aFC[func, n] c[n, x] +bFC[func, n] s[n, x] , (n, 1, N}) ;
fsl[x_ ] = fourierSeries[ab, 1, x];
fs3 [x_ ] = fourierSeries [ab, 3, x];
FIGURE 1.76
Plot [ (periodicextension, fs1 [x ), fs3 [x )}, ( x, -2 Pi, 2 Pi}, P1otRange ... All )
FIGURE 1.77
You can import spreadsheets created in a variety of formats to take advantage

of MATHEMATICA’s rich data manipulation and visualization capabilities.
A spreadsheet is included in the Mathematica documentation folder.
Import only the first 3 rows of the spreadsheet: Import only row 3 of the
spreadsheet:
{{ {1.31397 x 1 09 , Chi na), {1. 09 535 x10 9 , India ) ,

{2.98444 x 108 , United States} , { 2.45453 x 108 , Indonesia },
{1. 88078 x 109 , Brazi l } , {1. 6 5804 x 108 , Pakistan} , {1.47365 x 108 , Bangladesh} ,
{1.42894 x 10°, Russia) , [ 1.3186 x lO', Nigeri a ) , {1.27464 x 10° , Japan)})
FIGURE 1.78
a ( (1]] II Grid
1. 31397 X 109 China

.09535 X 109 India
2. 98444 X 108 Uni ted States
2. 45453 X 108 Indonesia
1.88078 X 108 Brazi l
1. 65804 X 109 Pakistan
1.47365 X 108 Bangladesh
1. 42894 X 108 Russia
1.3186 x l0 8 Nigeria
1.27464 X 108 Japan
FIGURE 1.79
z.port[•RxampleData/population. xls•, { • Data•, 1, {1, 2, 3})]
~ .31397 x10 9 , China) , {1.09535 x 109 , India) , {2.98444xl0°, United States})
FIGURE 1.80
~ort ("llxamp1eData/popu1at1on.x1a", {"Data•, 1, 3} ] J

I{2.98444 x l0° , United States ) I
--------------------------------~
FIGURE 1.81
Data Manipulation
Plot data with explicit date values:
~;2006, 10, 1}, 10), ((2006, 10, 15), 12), ((2006, 10, 30) , 15), ((2006 , 11, 20), 20))1
e L1atPlot(
ta)
2oco-----.-----.-----.-----.-----.-----.----oo
15
10 •
I o~Oct~O,~--~--~Oct~l6----~---~~.,~o----~--~~
~-~7,--~~
FIGURE 1.82
Plot monthly values starting in August 2000:
DateL1atP1ot ( (1, 1, 2, 3, 5, 8, 11), (2000, 8))
10
-
J~
FIGURE 1.83
Plot multiple datasets: Retrieve and plot a historical stock price:
I data1 ::
{{(2006, 10, 1), 10), ((2006, 10, 15), 12) , ((2006, 10, 30), 15), ((2006, 11, 20), 20))1
data2 = {({2006, 10, 5), 15), ((2006, 10, 20), B),
((2006, 11, 10), 5), ((2006, 11, 15), 1))1
DateL1 o tPlot [(datal, data2), Joined~ True)
~02 Oct 16 Oct30 Nov l3
FIGURE 1.84
DateLi•tPlot [PinancialData[ •IBJI•, •Jan. 1, 2004•]]
50
2006 2008 2010 2012 2014
FIGURE 1.85
Scope
Dates given as DateString specifications:

Dates given as elided DateList specifications:
data = {{•June 2006•, 10), (•Augu• t 2006•, 12),

{"Jiov<mlber 2006", 15), { " January 2007", 20))1
DateL18 tPlot (
data)
2o cr-----.-----,-----,-----,-----,-----,----~
15
10 .
lui Jm
FIGURE 1.86
a = {{{2006, 6), 10), {{2006, 8), 12), {{2006, 11), 1 5), {{2007, 1), 20))1
eL1• tPlot (data)
15
10 •
lui S.p Jm
FIGURE 1.87
Plot a series of data using an initial starting date or time.

Applications
Get stock price data.
DateL1atP1ot (S1n(Range(100 ) / (2 Pi ) ], "Auguo t 15, 2006")
1.0 cr----c
,..:;o
•••- - . - - - - - - - , - - -..r.: ••---.---,------;:
.~:- ••-.------,
.~;:-
0.5
0.0
-05
- LO .... ·-· Nov

Sop ""
FIGURE 1.88
DateL1 o tPlot(S1D(Range(100)/ (2P1 ) ), (2006, 8, 15, 1 2, 15, 0))
1.0 er---e •--,----,----,---.-=.~:r

;~"'•,, .•--.---.--.---.-=
.~"'~---c ..
0.5
00
-0.5
..
-l.~2.":'.toc
s_oo
;:;----'------'-----;c,;;'~:';·,_-:o,o;--'----'---,,::;:2_-,.;;-;_oo:;;-'•-,.·--c_-:-:12;-;_,7._::30:-"
FIGURE 1.89
data : PinancialData [ •cm•, {2000, 1, 1)]1
FIGURE 1.90
A sample data point:

Plot data gathered at regular intervals and stored without explicit dates:
Annual oil consumption since 1980:

data[ [1]]
{{ 2000, 1, 3 }. 33.14 )
FIGURE 1.91
DateLi•tPlot [data, Joined~ True, Pilling~ Bott0111.]
2000 2005 2010
FIGURE 1.92
.--
data = {56.1, 60.7, 51.6, 52. , 57.5, 56.7, 67.4, 69.9, 72.9, 69.7, 70.3, 72.1)1
DateL1•tPlot(data, {{2006, 6, 1, 8), Automatic, •&our•) ]
10
65
60
"
09:00 12:00 15:00 18:00
FIGURE 1.93
1.9 Exercises
1.1. Let a = (1, −3, 2), b = (−1, 1, 1) and c = (2, 6, 9).
Find (i) a + (b + c) (ii) 4a − (b + c) (iii) c + 2(a − 3b)
80000
75000
70000
65000
1990 2000
FIGURE 1.94
(iv) 4(a + 2c) − 6b (v) ||a − c|| (vi) ||a||||2c||

a b
(vii) || || + || || (viii) ||c||a + ||b||a.
||a|| ||b||
1.2. (i) Find a unit vector in the opposite direction of a = (5, −5, 20).
(ii) Find a unit vector in the same direction as a = i − 3j + 2k.
(iii) Find a vector b that is 5 times as long as a = i − j + k in the same
direction as a.
1.3. Let a = (2, −3, 4), b = (−1, 2, 5) and c = (3, 6, −1).
(i) Find b.a = hb, ai
(ii) a.(6b) = ha, 6bi
(iii) b.b and c.c
(iv) a.(a + b + c)

a.b
(v) b.
b.b
1.4. Find the orthogonal vectors from the following vectors:
(i) (2, 0, 1) (ii) 2i − j − k (iii) (−4, 3, 8)
(iv) i − 4j + 6k (v) (1, −1, 1) (vi) (4, −3, 8)
Indicate the pair of vectors which are non-orthogonal.
1.5. Find a scalar α so that given vectors are orthogonal:
(i) a = 2i − αj + 3k, b = 3i + 2j + 4k.
1
(ii) a = αi + j + αk, b = −3i + 4j + αk.
2
a.b
1.6. Show that the vector c = b − a is orthogonal to the vector a.
||a||2
1.7. Find the angle between following vectors
(i) a = 3i − k, b = 3i + 3k.
(ii) a = 2i + 4j + 0k, b = −i − j + 4k.

1 1 3
(iii) a = , , , b = (2, −4, 6).
2 2 2
1.8. Let a = (1, −1, 3) and b = (2, 6, 3). Find
(i) compb a (ii) compa b (iii) compa b − a.
1.9. Let a and b be two vectors and α and β be scalars, then show that
||αa + βb||2 = α||a||2 + β||b||2 + 2αβa.b.
1.10. Find the angle between two vectors a = (−1, 3, 1) and b = (0, 2, −4).
1.11. Let a = (5, 2, −1), b = (2, 0, −7). Compute a.(a × b) and b.(a × b).
1.12. Examine whether the following expression are scalar or vector
(i) i × (−3k) (ii) i × (j × k) (iii) (2i − j + 5k) × i
(iv) k.(j × k) (v) ||4j − 5(i × j)|| (vi) (i × k) × (j × i).
1.13. Determine whether the following vectors are linearly independent or
linearly dependent:
(i) (4,-8), (-6,12) in R2 .
(ii) (1,1), (0,1), (2,5) in R2 .
(iii) 1, (x + 1), (x + 1)2 in P2 .
1.14. Construct an orthonormal basis for R2 from the given basis:

(i) S = {(−3, 2), (−1, −1)}.
(ii) S = {(1, 1), (1, 0)}.
(iii) S = {(5, 7), (1, −2)}.
1.15. Using Gram-Schmidt orthogonalization, construct an orthonormal basis

for R3 from the following bases:
(i) S = {(1, 1, 0), (1, 2, 2), (2, 2, 1)}.
(ii) S = {(1, 1, 1), (9, −1, 1), (−1, 4, −2)}.
1.16. State the size of the three matrices below:  

2

 5 

   −6 
1 2  
1 2 3 18  0 
(i) (ii)  8 4  (iii) (6, 8, −30) (iv)  .
5 6 0 2  7 
5 12  

 −11 

 2 
12
1.17. Determine whether

 the following
 matrices are equal:
1 4
1 2 3 1 2 1 0 1 0
,  2 5  , , , .
4 5 6 0 1 2 1 0 1
3 6
1.18.
Find values of x and
y for which the
following matrices are equal:
1 x 1 y−2
and .
y −3 3x − 2 −3
1.19. Find entries
c2,3 and c1,2 for thematrix C = 2A − 3B, where
2 3 −1 4 −2 6
(i) A = ,B=
−1 6 0 1 3 −3
   
1 −1 1 2 0 4
(ii) A =  2 2 1  , B =  0 4 0  .
0 −4 1 3 0 7

2 −3 −1 6
1.20. If A = and B = , then find
−5 4 3 2
(i) AB (ii) A2 = AA (iii) B 2 = BB (iv) BA.

2 4 4 10
1.21. Suppose that A = and B = . Check the follow-
−3 2 2 5
ing properties for those two matrices:
(i) (AT )T = A (ii) (A + B)T = AT + B T
(iii) (AB)T = B T AT (iv) (6A)T = 6AT
1.22. Find two matrices A and B such that AB = 0 but A 6= 0 and B 6= 0.
1.23. Write the systems of equations for:
2x1 + 6x2 + x3 = 7
x1 + 2x2 − x3 = −1
5x1 + 7x2 − 4x3 = 9
as a matrix equation AX = B, where X and B are column vectors.

1.24. Solve the following systems of equations
(i) 10x1 + 15x2 = 1.

3x1 + 2x2 = −1.
(ii) x1 − x2 − x3 = 8.
x1 − x2 + x3 = 3.
−x1 + x2 + x3 = 4.
1.25. Balance the following chemical equations

(i) N a + H2 O → N aOH + H2
(ii) C5 H8 + O2 → CO2 + H2 O
1.26. Find the rank of the following matrices:

 
1 1 1
2 −2 3 −1 2 0
(i) (ii)  1 0 4  (iii) .
0 0 6 2 4 5
1 4 1
1.27. An elementary matrix E is obtained by performing a single row oper-
ation on the identity matrix I. Verify that the following matrices are
elementary.  
    1 0 0 1
0 1 0 1 0 0  0 1 0 0 
(i)  1 0 0  (ii)  0 1 0  (iii) 
 0 0 1 0 .

0 0 1 0 0 1
0 0 0 1
1.28. Set up and solve the system of equation for the currents in the branches
of the network shown below.
10v 27v
+ - + -
50 60
30
FIGURE 1.95: Electric Network
1.29. Examine whether the given set of vectors is linearly dependent or linearly
independent.
(i) u1 = (1, −1, 3, −1), u2 = (1, −1, 4, 2), u3 = (1, −1, 5, 7).
(ii) u1 = (1, 2, 3), u2 = (1, 0, 1), u3 = (1, −1, 5).
(iii) u1 = (2, 1, 1, 5), u2 = (2, 2, 1, 1), u3 = (3, −1, 6, 1), u4 = (1, 1, 1−1).
 
2 3 4
1.30. Let A =  1 −1 2 . Find the indicated minor, determinant or
−1 3 5
cofactor: M12 , 2M32 , c13 , c22 , c23 .
   
0 2 0 1 −1 −1
1.31. Let A =  3 0 1 , B =  2 2 −2 . Find detA and detB.
0 5 8 1 1 9

a11 a12 a21 a22
1.32. Let A = and B = Show that detA =
a21 a22 a11 a12
−detB.

a11 a12
1.33. Let A = . Show that detA = detAT .
a21 a22
   
2 −1 1 2 1 5
1.34. Let A =  3 1 −1  and B =  4 3 8 Verify that
0 2 2 0 −1 0
detAB = detAdetB.
1.35. Suppose that A is n × n matrix such that A2 = I where A2 = AA. Show
that detA = ±1.
1.36. Check whether the following matrices are invertible, if invertible find the
inverse.

5 −1 6 −2
(i) (ii)
4 1 0 4 
  1 0 0 0
1 2 3  0 0 1 0 
(iii)  0 −4 2  (iv) 
 0 0 0 1 .

−1 5 1
0 1 0 0

4 3
1.37. If A−1 = , then find A.
3 2

4 −3
1.38. Find a value of α such that the matrix A = is its own
α −4
inverse.

sin θ cos θ
1.39. Find the inverse of A = .
− cos θ sin θ
1.40. Solve the following systems applying Cramer’s rule or indicate that the
rule does not apply
(i) 15x1 − 4x2 = 5

8x1 + x2 = −4
(ii) 8x1 − 4x2 + 3x3 = 0

x1 + 5x2 − x3 = −5
−2x1 + 6x2 + x3 = −4
(iii) x1 + x2 − 3x3 = 0
x2 − 4x3 = 0
x1 − x2 − x3 = 5
(iv) x1 − 2x2 − 3x3 = 3

x1 + x2 − x3 = 5
3x1 + 2x2 = −4
1.41. (a) Find the eigenvalues of the matrix given below.

(b) Corresponding to each eigenvalue, find an eigenvector

−2 0 1 −6 0 1
(i) (ii) (iii)
1 4 2 2 0 0
   
2 0 0 1 −2 0
(iv)  1 0 2  (v)  0 0 0 .
0 0 3 −5 0 7
1.42. Find the general terms of all 2 × 2 matrices with real elements having
eigenvalues 4 and −2.
1.43. For each matrix below, produce a matrix that diagonalizes it or show
that this matrix is not diagonalizable.  
5 0 0
5 3 1 0
(i) (ii) (iii)  1 0 3  .
1 3 −4 1
0 0 −2
1.44. Find the eigenvalues of the matrices below and for each value, a corre-
sponding eigenvector. Verify that eigenvectors associated with distinct
eigenvalues are orthogonal. Find an orthogonal matrix that diagonalizes
each matrix.    
5 0 2 1 3 0
4 −2 −13 1
(i) (ii) (iii)  0 0 0  (iv)  3 0 1  .
−2 1 1 4
2 0 0 0 1 1
1.44. Discuss the use of matrices in rating world cities based on a number
of variables such as cost of living, quality of education, availability and
security of jobs, extent of cultural and recreation opportunities, trans-
portation, health facilities, air quality, safety and crime, and climate.
1.45. Show that Markov matrix
 
0 1 0
A= 0 0 1 
1 0 0
has complex eigenvalues and is orthogonal.

1.46. Prove that eigenvalue λ of a Markov matrix satisfies |λ| ≤ 1.
1.47. Let A be Markov matrix with all positive elements, then show that λ is
the only eigenvalue of modulus 1 and (A − In ) = 1.
1.48. Describe application of Markov chain in A. S. Pushkin’s “Yevgeniy One-
gin” poem.
1.49. Discuss relevance of Markov matrix and Markov chain to Shannon’s
communication theory.
1.50. Write an essay on application of Markov chain and Markov matrices to
compute performance evaluation, page rank and Web search.
1.10 Suggestion for Further Reading

Computational science, also called scientific computing or scientific com-
putation, is a growing multidisciplinary field that uses advanced computing
capabilities to explain complex problems. The scientific computing programme
aims to produce highly skilled computational scientists and engineers capable
of applying numerical methods and critical evaluation of their results to their
field. Scientific computing is introduced clearly by Heath [5].
Image processing is an important ingredient of modern engineering and
technology. Interested readers may pursue references [2] and [3] for acquiring
deeper knowledge. References [1, 4, 8, 11] are also useful.
Bibliography
[1] G. Burril, J. Burril, J. Landwehr, J. Witmer, Advanced Modeling and

Matrices, Dale Seymour Publications, 1988.
[2] J. Gomes, L. Velho, Image Processing for Computer Graphics and Vision,
Springer, 2008.
[3] R. C. Gonzalaez, R.E. Woods, Digital Image Processing, Third Edition,
Prentice Hall, 2007.
[4] K. Hardy, Liner Algebra for Engineers and Scientists Using MATLAB,
Pearson, Prentice Hall, 2005.
[5] M. T. Heath, Scientific Computing: An Introductory Survey, Mcgraw Hill,
Second Edition, 2002.
[6] B. Kolman and D. R. Hill, Elementary Linear Algebra with Applications,

Ninth Edition, Pearson International, 2008.
[7] D. Lay, Linear Algebra, and Its Applications, Fourth Edition, Addison
Wesley, 2011.
[8] S. J. Leon, Linear Algebra with Applications, Seventh Edition, Pearson
Prentice Hall, 2006.
[9] H. Neunzert and A. H. Siddiqi, Topics in Industrial Mathematics: Case
Studies and Related Mathematical Methods, Kluwer Academic Publishers,
2000.
[10] P. V. Hilgers and A. N. Langville, The Five Greatest Applications of

Markov Chains, Langvillea. people.cofe.edu/Mcapps7pdf.
[11] D. G. Zill and M. R. Cullen, Advanced Engineering Mathematics, Jones
and Barlett Publishers, 2012.
113
Chapter 2
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

2.1.1 Definitions and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
2.2 Introduction to Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . 126
2.2.1 Population Dynamics (Exponential and Logistic Model) 127
2.2.2 Radioactive Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.2.3 Carbon Dating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.2.4 Newton’s Law of Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.2.5 Spread of Diseases and Rumors . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.2.6 Series Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.2.7 Draining Tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.2.8 Spring and Mass System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2.2.9 Mixture of Salt and Payment of Loan . . . . . . . . . . . . . . . . . . 133
2.2.10 Predator-Prey Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.2.11 Model of Groundwater Contaminant Source . . . . . . . . . . . . 135
2.2.12 Heart Pacemaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.2.13 X-Ray and Beer’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.2.14 Model of Spreading Information . . . . . . . . . . . . . . . . . . . . . . . . 137
2.2.15 Model for Circulation of Money . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.3 MATLAB and MATHEMATICA for Differential Equations . . . . 138
2.3.1 Solving Ordinary Differential Equations (ODEs) by
MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.3.2 Solving Differential Equation by MATHEMATICA . . . . 141
2.4 Methods for Solving First Order Linear Differential Equations . 143
2.4.1 Method of Separation of Variables . . . . . . . . . . . . . . . . . . . . . . 143
2.4.2 Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2.4.3 Exact Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
2.5 Methods for Solving Higher Order Differential Equations . . . . . . . 151
2.5.1 Initial Value and Boundary Value Problems . . . . . . . . . . . . 151
2.5.2 Homogeneous Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
2.5.3 Non-homogeneous Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
2.5.4 Reduction of Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.5.5 Homogeneous Linear Equations with Constant
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
2.5.6 Method of Undetermined Coefficients . . . . . . . . . . . . . . . . . . . 162
2.5.7 Method of Variation of Parameters . . . . . . . . . . . . . . . . . . . . . 170
2.5.8 Cauchy-Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
115
2.6 Solution of Engineering Problems Modeled by Differential

Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
2.6.1 Population Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
2.6.2 Radioactive Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
2.6.3 Carbon Dating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
2.6.4 Newton’s Law of Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
2.6.5 Spread of Diseases, Technologies and Rumor . . . . . . . . . . . 183
2.6.6 Series Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
2.6.7 Draining Tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
2.6.8 Spring and Mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
2.6.9 Mixture of Salt and Payment of Loan . . . . . . . . . . . . . . . . . . . 187
2.7 Laplace Transform for Linear Differential Equations . . . . . . . . . . . . 190
2.7.1 Introduction to Laplace Transform . . . . . . . . . . . . . . . . . . . . . . 190
2.7.2 Translation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.7.3 Inverse Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
2.7.4 Step and Impulse Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
2.7.5 Some Additional Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
2.7.6 Application to Differential and Integral Equations . . . . . 213
2.8 Series Solution of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . 223
2.8.1 Review of Properties of Power Series . . . . . . . . . . . . . . . . . . . 223
2.8.2 Solution about Ordinary Point . . . . . . . . . . . . . . . . . . . . . . . . . . 226
2.8.3 Solution about Regular Singular Points: The Method of
Frobenius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
2.8.4 Bessel’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
2.8.5 Legendre’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
2.1 Introduction
Mathematics has been considered the language of nature since the days of
Galileo and in modern times it is considered the mother of all technologies.
Differential equations were invented in the course of investigations of the laws
that govern the physical world. Sir Isaac Newton (1642-1727) first solved them
in the 17th century and called them fluxional equations. Gottfried Leibnitz,
a contemporary of Newton credited with the invention of calculus, renamed
fluxional equations as differential equations, a name that remains in use today.
Modern methods for solving differential equations and their practical ap-
plications constitute major portions of mathematics curricula for engineering
and science students. Leibnitz invented some of the methods but the gen-
eral theory for differential equations was developed by an electrical engineer,
Augustin-Louis Cauchy (1789–1857).
In the 20th century, several Nobel Prizes were awarded for research that
Differential Equations 117
involved differential equations. For example, William Libby received the prize
for chemistry in 1960 for the invention of carbon dating. The 1952 physics
prize was awarded to Felix Block and Edward Purcell. They calculated the
rates of change over time of magnetic moments of nuclei at each point in a
sample. The 1979 prize for medicine and physiology went to Allan Cormack
and Godfrey Hounsfield for their work on computerized tomography (CT).
Robert Merton and Myron Scholes received the 1997 economics prize for their
work with Fisher Black on the Black-Scholes option pricing model.
Differential equations play a central role in describing phenomena that
involve changes. The main goal of this chapter is to introduce basic methods
for solving ordinary differential equations and developing modeling techniques
for application to real world problems. Methods for dealing with separation
of variables, undetermined coefficients, parameter variations, Cauchy-Euler
and Laplace transform methods, and series solutions are some of the classical
methods covered in this chapter. MATLAB and MATHEMATICA solutions
of differential equations are also discussed.
These methods are more efficient than analytical approximations based
on power series. Therefore Section 2.3 (MATLAB and MATHEMATICA for
Differential Equations) may be preferred over Section 2.8 (Series Solution of
Differential Equations) in some situations. The applications of modeling tech-
niques in various fields such as environmental problems, nuclear physics, elec-
tricity and electronics, mechanics, and biomedical engineering are presented
in Section 2.6.
2.1.1 Definitions and Terminology

Definition 30. Differential Equation
An equation containing the derivatives of one or more dependent variables
is said to be differential. Many principles underlying behaviors in the natu-
ral world involve the rates at which events (chemical reactions and disease
progression for example) happen. Mathematical formulations of laws contain-
ing derivatives are differential equations (DEs). The study of DEs is essential
to the proper understanding of fluid motion, current flows through electric
circuits, heat dissipation, detection of seismic waves, population dynamics,
spreading of diseases and even spreading of rumors. DEs that represent phys-
ical processes are often called mathematical models and they are discussed
throughout this chapter.
Definition 31. Ordinary Differential Equation
A differential equation is said to be an ordinary differential equation (ODE) if
it contains only ordinary derivatives of one or more dependent variables with
respect to a single independent variable.
Definition 32. Partial Differential Equation
An equation involving the partial derivatives of one or more dependent vari-
ables of two or more independent variables is called a partial differential

equation (PDE).
Example 74.
dy d2 y
+ 20y = ex , + 10y = cos x
dx dx2
d2 y dy dx dy
3 −9 + 6y = 0, + = 2x + y
dx2 dx dt dt
are examples of differential equations.
Example 75.
∂2u ∂ 2 u ∂u ∂2u ∂2u ∂2u

2
= 2
, = 2, 2
=− 2
∂t ∂x ∂x ∂t ∂x ∂y
∂2u ∂2u ∂u ∂u ∂u ∂2u ∂2u ∂u

2
= 2
−2 , =− , and 2
= 2
−2
∂x ∂t ∂t ∂y ∂x ∂x ∂t ∂t
are examples of partial differential equations.
Definition 33. Order of a Differential Equation
The order of differential equation (ODE or PDE) is the order of the highest
derivative in the equation.
Example 76. (i) Order of the differential equation
4
d2 y

dy
+5 − 4y = ex is 2.
dx2 dx
(ii) Order of the differential equation
d3 y d2 y
3
+ 2 − 4y = e−x is 3.
dx dx
Definition 34. The degree of a differential equation is the degree of the
highest order derivative in the equation.
Example 77. (i) The degree of ODE
3
dy √

dy
y − x − 5 = 0 is 3.
dx dx
(ii) The degree of ODE

3 2
d2 y

dy
− + 6y + 10 = 0 is 3.
dx2 dx
Remark 17. (i) Very often notations y 0 , y 00 , y 000 , . . . , y n are respectively used
for
dy d2 y d3 y dn y
, 2, 3,..., n.
dx dx dx dx
(ii) In symbols we can express an nth order ordinary differential equation in
one dependent variable by general form:
dn y
F (x, y, y 0 , y 00 , . . . , y (n) ) = 0, where y (n) = (2.1)
dxn
where F is a real valued function of n + 2 variables.
Definition 35. Linear and Non-linear Differential Equations
An nth order ordinary differential equation is said to be linear in y if it can
be written in the form
an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y + a0 (x)y = f (x)
where a0 , a1 , a2 . . . an and f are functions of x on some interval, and an (x) 6= 0

on that interval. The functions ak (x), k = 0, 1, 2, . . . , n are called coefficient
functions. A differential equation that is not linear is called non-linear.
Example 78. (a) y 00 − 4y 0 + 3y = x4 and xy 00 + yex + 6 = 0 are linear
differential equations.
d3 y dy
(b) (y − x)dx + 4xdy = 0, x3 3 − 4x + 12y = ex , y 00 − 4y 0 + y = 0 are
dx dx
linear ordinary differential equations.
d2 y d3 y
(c) ODE (1 + y)y 0 + 2y = ex , 2
+ cos y = 0, 3 + y 2 = 0 are non-linear.
dx dx
(d) A general form of the first order ordinary differential equation is
dy
a1 (x) + a0 (x)y = g(x).
dx
(e) A general form of second order ordinary linear differential equation is
d2 y dy
a2 (x) 2
+ a1 (x) + a0 (x)y = g(x).
dx dx
Remark 18. An ordinary differential equation is linear if the following con-
ditions are satisfied.
(i) The unknown function and its derivative occur in the first degree only.
(ii) There are no products involving either the unknown function and its
derivatives or two or more derivatives.
(iii) There are no transcendental functions involving the unknown function

or any of its derivatives.
Definition 36. Solutions: (i) A solution or general solution of an nth
order differential equation of the form (2.1) on an interval I = [a, b] = {x ∈
R/a ≤ x ≤ b} is any function possessing all the necessary derivatives, which
when substituted for y, y 0 , y 00 , . . . , y (n) , reduces the differential equation to an
identity. In other words an unknown function is a solution of a differential
equation if it satisfies the equation.
(ii) A solution of a differential equation of order n will have n independent
arbitrary constants. A solution obtained by assigning particular numerical val-
ues to some or all constants is called a particular solution.
(iii) A solution of a differential equation that is not obtainable from a general
solution by assigning particular numerical values is called a singular solu-
tion.
(iv) A real function y = φ(x) is called an explicit solution of the differential
equation F (x, y, y 0 , . . . , y (n) ) = 0 on [a, b] if
F (x, φ(x), φ0 (x), . . . , φ(n) (x)) = 0 on [a, b].
(v) A relation g(x, y) = 0 is called an implicit solution of the differential

equation F (x, y, y 0 , . . . , y (n) ) = 0 on [a,b] if g(x, y) = 0 defines at least one real
function f on [a,b] such that y = f (x) is an explicit solution on this interval.
We now illustrate these concepts through the following examples:
Example 79. (i) y = c1 ex+c2 is a solution of the equation y 00 − y = 0.
This ODE is of order 2 and so its solution involves two arbitrary constants c1
and c2 . It is clear that
y 0 = c1 ex+c2 , y 00 = c1 ex+c2 and so c1 ex+c2 − c1 ex+c2 = 0.
Hence y = c1 ex+c2 is a general solution or simply a solution.

(ii) y = ce2x is a solution of ODE y 0 − 2y = 0, because y 0 = 2ce2x and y = ce2x
satisfy the ODE. Since given ODE is of order 1, the solution contains only
one constant.
1
(iii) y = cx + c2 is a solution of the equation
2
1 0 2
(y ) + xy 0 − y = 0.
2
To verify the validity, we note that y = c, and therefore
1 2 1
(c) + cx − (cx + c2 ) = 0.
2 2
(iv) y = c1 e2x + c2 e−x is a general solution of the differential equation y 00 −
y 0 − 2y = 0 of order 2. To check the validity, we compute y 0 and y 00 and put

values in these equations:
y 0 = 2c1 e2x − c2 e−x , y 00 = 4c1 e2x + c2 e−x .
The left hand side of the given ODE y 00 − y 0 − 2y = 0 is = (4c1 e2x + c2 e−x ) −
(2c1 e2x − c2 e−x ) = −2(c1 e2x + c2 e−x ) = 0.
Example 80. (i) Choosing c = 1 we get a particular solution of the differential
equation considered in Example 79(iii).
(ii) For c1 = 1 we get a particular solution of the differential equation in
Example 79(i) that is, y = ex+c2 is a particular solution of y 00 − y = 0.
1
Example 81. (i) y = − x2 is a singular solution of the given differential
2
1
equation in Example 79(iii). y = − x2 is not obtainable from the general
2
1 2
solution y = cx+ c . However it is a solution of the given differential equation,
2
can be checked. For y = −x, by putting values of y and y 0 into the right hand
side of the equation we get

1 1
(−x)2 + x(−x) − − x2 = x2 − x2 = 0.
2 2
(ii) y = 0 is a singular solution of y 0 = xy 1/2 .

x2
Verification: The general solution of this equation is y = + c. For c = 0,
4
we do not get the solution y = 0. Therefore, the solution y = 0 of the equation
is not obtainable from the general solution. Hence y = 0 is a singular solution.
Example 82. (i) y = sin 4x is an explicit solution of y 00 + 16y = 0 for all real
x.
Verification: y 0 = −4 cos 4x, y 00 = −16 sin 4x. Putting the value of y and y 00
in terms of x into the right hand side of equation we get −16 sin 4x+16 sin 4x =
0.
Hence the equation is satisfied for y = sin 4x and y = sin 4x is an explicit
solution of the given equation.
(ii) y = c1 ex + c2 e−x is an explicit solution of the equation y 00 − y = 0.
Verification: y 0 = c1 ex −c2 e−x , y 00 = c1 ex +c2 e−x . Using values of y and y 00 in
the right hand side of the given equation to get (c1 ex +c2 e−x )−(c1 ex +c2 e−x ) =
0.
Example 83. (i) The relation x2 + y 2 = 4 is an implicit solution of the
differential equation
dy x
= − on the interval (−2, 2).
dx y
Verification: By implicit differentiation of the relation we get

dy dy x
−2x = 2y or =− .
dx dx y
√ √ 2 2
2 2
√ y = 4 − x and y = − 4 − x satisfy the relation (y = 4 − x and
Further
2
y = ± 4 − x ) and are solutions of the differential equation
dy x
=− .
dx y
It is clear that
1 1 x
y1 = √ (−2x) = −
4 − x2 2 y1
and
1 1 x
y20 = − √ (−2x) = − .
4−x 2 2 y2
(ii) The relation y 2 + x − 4 = 0 is an implicit solution of 2yy 0 + 1 = 0 on the
interval (−∞, 4).
Verification: Differentiating y 2 + x − 4 = 0 with respect to x, we obtain
dy
2y + 1 = 0 or 2yy 0 + 1 = 0, which is the given differential equation. Hence
dx
y 2 + x + 4 = 0 is an implicit solution if it defines a real function
√ on (−∞, 4).
2
Solving
√ the equation√ y + x − 4 = 0 for y, we get y = ± 4 − x. Both y1 =
4 − x and y2 = − 4 − x and their derivatives are functions defined for all x
in the interval (−∞, 4). We conclude that y 2 + x − 4 = 0 is an implicit solution
on this interval.
Remark 19. Note that a relation g(x, y) = 0 can reduce a differential to an
identity without constituting an implicit solution of the differential equation.
For example x2 + y 2 + 1 = 0 satisfies yy 0 + x = 0, but it is not an implicit
solution as it does not define a real-valued function.
√ This is clear from the
solution of the equation x2 + y 2 + 1 = 0 or y = ± 1 − x2 , imaginary number.
The relation x2 + y 2 = 1 is called a formal solution of yy 0 + x = 0. That
is, it appears to be a solution. Very often we look for a formal solution rather
than an implicit solution.
Differential Equation of Family of Curves

Let us consider an equation containing n arbitrary constants. Then by differen-
tiating it successively n times we get n more equations containing n arbitrary
constants and derivatives. Now by eliminating n arbitrary constants from the
above (n + 1) equations and obtaining an equation which involves derivatives
up to the nth order, we get a differential equation of order n. The concept
of obtaining differential equations from a family of curves is illustrated in the
following examples.
Example 84. Find the differential equation of the family curves y = ce2x .
Solution: Consider
y = ce2x . (2.2)
Differentiating Equation (2.2) we get
y 0 = 2ce2x = 2y
or
y 0 − 2y = 0. (2.3)
Thus, arbitrary constant c is eliminated and Equation (2.3) is the required
differential equation of the family of curves given by Equation (2.2).
Example 85. Find the differential equation of the family of curves
y = c1 cos x + c2 sin x. (2.4)
Solution: Differentiating (2.4) twice we get
y 0 = −c1 sin x + c2 cos x (2.5)
y 00 = −c1 cos x − c2 sin x (2.6)
c1 and c2 can be eliminated from (2.4) and (2.6) and we obtain the differential
equation
y 00 + y = 0. (2.7)
(2.7) is the differential equation of the family of curves given by (2.4).
Initial Value and Boundary Value Problems

A general solution of an nth order ordinary differential equation contains n
arbitrary constants. To obtain a particular solution, we are required to specify
n conditions on solution function and its derivatives and thereby expect to
find values of n arbitrary constants. There are two well known methods for
specifying auxiliary conditions. One is called initial conditions and other is
boundary conditions
It may be observed that an ordinary differential equation does not have a
solution or unique solution. However, by imposing initial and boundary con-
ditions, uniqueness can be ensured for certain classes of differential equations.
Definition 37. Initial Value Problem. If the auxiliary condition for a
given differential equation relates to a single x value, the conditions are called
initial conditions, the differential equation with its initial conditions is called
an initial value problem.
Definition 38. Boundary Value Problem. If the auxiliary conditions for a
given differential equation related to two or more x values, the conditions are
called boundary conditions or boundary values. The problem of finding
solution of a differential equation with its boundary conditions is called a
boundary value problem.
Example 86. (i) y 0 + y = 3, y(0) = 2 is a first-order initial value problem.

The order of the initial value problem is nothing but the order of the given
equation and thus y(0) = 2 is an initial value condition.
(ii) y 00 + 2y = 0, y(1) = 2, y 0 (1) = −3 is a second order initial value problem.
Initial conditions are y(1) = 2 and y 0 (1) = −3. Values of function y(x) and
its derivative are specified for value x = 1.
(iii) y 00 − y 0 + y = x3 , y(0) = 4, y 0 (1) = −2 is a second order boundary value
problem. Boundary conditions are specified at two points, namely x = 0 and
x = 1. One may specify boundary conditions for different values of x, say
x = 2 and x = 5. In this case the boundary value problem is
y 00 − y 0 + y = x3 , y(2) = 4, y 0 (5) = −2.
The following questions are relevant as boundary value and initial value prob-
lems represent important phenomena in nature:
Problem 1. When does a solution exist? That is, does an initial value problem
or a boundary value problem necessarily have a solution?
Problem 2. Is the known solution unique? That is, is there only one solution
of an initial value problem or a boundary value problem?
The following theorem states that under the specified conditions, a first
order initial value problem has a unique solution.
∂f
Theorem 15. Let f and fy = ( ) be continuous functions of x and y in
∂y
some rectangle R of the xy-plane and let (x0 , y0 ) be a point in that rectangle.
Then on some interval centred at x0 there is a unique solution y = f (x) of
the initial value problem:
dy
= f (x, y), y(x0 ) = y0 .
dx
Example 87. (i) y = 4ex is a solution of the initial value problem:
y 0 = y, y(0) = 4.
This means that the solution of the differential equation y 0 = y passes through
point (0, 4).
Verification: Let y = cex , where c is an arbitrary constant. Then y 0 = cex = y
and y 0 = cex = y. Thus y = cex is a general solution of the given equation
y 0 = y. By applying the initial condition we get 4 = y(0) = ce0 or c = 4.
Therefore y = 4ex is a solution of the given initial value problem.
(ii) Find a solution of the initial value problem y 0 = y, y(1) = 3. That is,find
a solution of differential equation y 0 = y which passes through the point (1, 3).
Solution: As seen in part (i) y = cex is a solution of the given equation and
by imposing the given initial condition we get
3 = y(1) = ce1 or c = 3e−1 .
Therefore y = 3e−1 ex = 3ex−1 is a solution of the initial value problem.
(Xo, yo)
v-\- /
y =0(x)
R
Xo
X
+-- I----+
FIGURE 2.1: Geometrical Illustration of Theorem 15
dy
Example 88. = xy 1/2 , y(0) = 0 has at least two solutions, namely y = 0
dx
and y = x4 /16.
Example 89. (i) Does a solution of the boundary value problem y 00 + y =
0, y(0) = 0, y(π) = 2 exist?
(ii) Show that the boundary value problem
y 00 + y = 0, y(0) = 0, y(π) = 0
has infinitely many solutions.

Solution: (i) y = c1 cos x + c2 sin x; we get
0 = c1 cos 0 + c2 sin 0
and 2 = c1 cos π + c2 sin π.
The first equation yields c1 = 0 and the second yields c1 = −2 which is absurd,
hence no solution exists.
(ii) The boundary values yield
0 = c1 cos 0 + c2 sin 0
and
0 = c1 cos π + c2 sin π.
Both of these equations lead to the fact that c1 = 0. The constant c2 is not
assigned a value and therefore takes arbitrary values. Thus there are infinitely
many solutions represented by y = c2 sin x.
Example 90. Examine the existence and uniqueness of a solution of the
following initial value problems:
(i) y 0 = y/x, y(2) = 1.
(ii) y 0 = y/x, y(0) = 3.
(iii) y 0 = −x, y(0) = 2.
Solution: We examine whether conditions of Theorem 15 are satisfied. To
check (i), we observe that
y ∂f 1
f (x, y) = and (x, y) = .
x ∂y x
∂f
Both functions are continuous except at x = 0. Hence f and satisfy the
∂y
condition of the theorem in any rectangle R that does not contain any part
of the y-axis (x = 0). Since the point (2, 1) is not on y-axis, there is a unique
1
solution. One can check that y = x is the only solution. (ii) In this problem
2
∂f
neither f nor is continuous at x = 0. Hence Theorem 15 is not applicable.
∂y
dy
Definition 39. A first order differential equation of the form = g(x)h(y),
dx
where g(x) and h(y) are functions of x and y only, respectively is called sep-
arable or said to have separable variables.
2.2 Introduction to Mathematical Modelling

A mathematical description of a law, a process, a system or a phenomenon
is called a mathematical model. Mathematical models could be demon-
strated in terms of functions, matrices and differential equations with or with-
out initial value and boundary conditions. In general a mathematical model
may not have a solution and even it has one, it may not be unique. Appropriate
initial value and boundary value conditions ensure existence and uniqueness
of the solution of a model. Construction of a mathematical model begins with
identification of variables that are responsible for change in the situation un-
der consideration. For the sake of simplicity one may consider only few of
such variables. Once the process or the system is understood all remaining
variables or few more could be incorporated. Since the assumptions about a
process or system often involves a rate of change of one or more of the vari-
ables, the mathematical representation of all these assumptions may be one
or more equations containing derivatives that is, mathematical model may be
a differential equation or system of differential equations. Once a model is
obtained (Figure 2.2) we look for existence and uniqueness of its solution. If
we can solve it, we claim that the model is reasonable if its solution is con-
sistent with experimental data or known facts about the process or system.
If the prediction by the solution is of poor quality, we may increase number
of variables or make alternative assumptions in the system. The steps of the
modelling are shown in the following diagram:
Express
Assumptions assumptions in Mathematical
terms of differential formulation
l
equations
If necessary alter
assumptions or Solve differential
increase resolution of equations
model
Check model Display model

j
predictions wit-1 1-.J...,.t----- Obtain
prediction, e.g. solutions
known facts graphically
FIGURE 2.2: Steps in Modelling
It may be noted that a mathematical model of a physical system frequently

involves the variable time t. A solution of such a model gives state of the
system,that is, for appropriate value of t, the values of the dependent variable
or variables describe the system in the past, present and future.
2.2.1 Population Dynamics (Exponential and Logistic Model)

In 1798,the British economist Thomas Malthus made the first attempt
to devise a mathematical model of human population growth. He made the
assumption that the rate at which a population of a country grows at a certain
time is proportional to the total population of the country at that time. In
mathematical language, if P (t) denotes the total population at time t, then
this assumption can be expressed as
dP dP
∝P or = λP, (2.8)
dt dt
where λ is constant of proportionality. This simple model does not take into
account many factors such a immigration and emigration that can influence
populations to grow or decline. However, it predicts correctly the population
in many cases, for example the population of the United States of America
during the period 1790 to 1860. Around 1840 the Belgium mathematician cum
biologist P. F. Verhulst discovered a mathematical model to predict the human
population of a country or city. This model is a generalization of Malthusian
model and is known as the logistic equation and its solution is called logistic
function. The graph of a logistic function is called a logistic curve. Let P (t)
denote the population at time t and let a > 0, b > 0 be constants whose
values depend on the population. The logistic model of the population growth
is given as
dP (t)
= aP (t) − bP 2 (t). (2.9)
dt
Verhulst had argued that the rate of change of the population P (t) with re-
spect to t should be influenced by growth factors such as population itself and
also factors retarding the population, namely limitations of food and space.
First term aP (t) represents growth factor while −bp2 (t) represents retarda-
tion factors. In applications, very often a is much larger than b. It is clear that
(2.9) reduces to (2.8) if b is neglected.
2.2.2 Radioactive Decay

The nucleus of an atom consists of combinations of protons and neutrons.
The atoms decay or transmute into atoms of another substance. Such nuclei
are called radioactive. For instance, the radioactive radium, Ra−226, trans-
mutes into a radioactive gas, Rn−222 over a certain period of time. In mod-
eling the phenomenon of radioactive decay the material is pressurized so that
the rate of change at which nuclei of a substance decays is proportional to the
amount A(t) of the substance remaining at time t, that is,
dA dA
∝ A(t) or = λA, (2.10)
dt dt
(2.10) is similar to (2.8) which tells us a single differential equation can serve
as a mathematical model for different physical phenomena. In physics the half
life is simply the time it takes for one-half of the atoms in an initial amount
to disintegrate or transmute into atoms of another element. For example the
half life of highly radioactive radium, Ra−226, is about 1700 years. The half
life of uranium isotope U−238 is approximately 4,500,000,000 years. λ > 0 is
called growth constant and λ < 0 is called decay constant.
2.2.3 Carbon Dating

Solution of Equation (2.10) is based on important technique developed by
the chemist Willard Libby in 1950 to determine the ages of certain artifacts.
Libby was awarded the 1960 Nobel Prize in chemistry for this work. The
process of estimating the age of an artifact or fossil is called carbon dating.
The theory of carbon dating is essentially based on the knowledge that the
half life of radioactive C−14 is approximately 5600 years. Libby’s method has
been used to date furniture in Egyptian tombs, to decide Van Meegeren art
forgeries and to date different civilizations through archaeological excavation.
2.2.4 Newton’s Law of Cooling

Newton’s law of cooling states that the rate of change at which a body cools
is proportional to the difference between the temperature of the body and the
temperature of the surrounding medium. Let T (t) represent the temperature
of a body at time t, Tm be the temperature of the surrounding medium, and
dT
the rate at which the temperature of the body changes. The mathematical
dt
formulation of Newton’s law of cooling is
dT dT
∝ T − Tm or = λ(T − Tm ) (2.11)
dt dt
where λ is a constant of proportionality.
2.2.5 Spread of Diseases and Rumors

While modeling spread of a contagious disease like a flu virus, it is quite
reasonable to assume that rate of the spread of disease is proportional to the
number of persons P (t) who have contracted the disease and the number of
dP (t)
persons N (t) who have not been exposed. If the rate of spread is then
dt
dP (t)
= λP (t)N (t) (2.12)
dt
where λ is the constant of proportionality. Assume that one infected person
enters into a fixed population of M persons, then P , N and M are related to
P + N = M + 1 or N = M − P + 1. Putting the value of N in terms of M
and P in (2.12), we obtain
dP (t)
= λP (t)(M − P + 1). (2.13)
dt
A solution of the following initial value problem provides the number of in-
fected persons at any time t:
dP (t)
= αP (t)(M + 1 − P (t)) (2.14)
dt
P (0) = 1. (2.15)
Model (2.14) can be used for spread of a rumor; see Brenan [5, p.717].
2.2.6 Series Circuit

Let us consider the single-loop series circuit having an inductor as (L),
resistor (R treated as constant) and capacitor (C) shown in Figure 2.3.
E (t)
FIGURE 2.3: Series Circuit
Let E(t) denote impressed voltage on a closed loop, θ(t) denote the change on
the given capacitance and I(t) denote the amount of current after the switch
is closed. By Kirchhoff’s second law, the impressed voltage E(t), on a closed
loop must equal the sum of the voltage drops in the loop. Current I(t) is
related θ(t) on the capacitor by
dθ(t)
I(t) = .
dt
dI d2 θ(t)
By adding the three complicated drops L = L (inductor), IR =
dt dt
dθ(t) 1
R (resistor) and θ(t) (capacitor) and equating the sum to the im-
dt C
pressed voltage E(t), we get
d2 θ dθ 1
L + R + θ = E(t). (2.16)
dt2 dt C
We shall discuss the solution of the model given by (2.16) in Section 2.6.5. If
E(t) = 0, the electric vibrations of the circuit are said to be free vibrations.
A circuit is called over damped if R2 − 4LIC >0, critically damped if
R2 − 4LIC = 0, and under damped if R2 − 4LIC <0. For a series circuit

containing only a resistor and inductor, Kirchhoff’s second law states that the
dI
sum of voltage drop across the inductor L and the voltage drop across the
dt
resistor (IR) are the same as the impressed voltage (E(t)) on the circuit:
dI
L + RI = E(t).
dT
The above equation is the model for current flow where L and R are constants
known as the inductor and resistance, respectively. The current I(t) is also
called the response of the system.
2.2.7 Draining Tank

In Figure 2.4 a cylindrical container having a constant cross section area A
is shown. The orifice at the base of the container has a constant cross sectional
area B. If the container is filled with water to a height h, water will flow out
through the orifice.
FIGURE 2.4: Draining a Tank
Let h be the height at time t and h + ∆h be the height at t + ∆t. The

volume of the water loss when the level drops by ∆h equals the volume of
water that goes out from the orifice. The volume of the water going out in
time ∆t is given by −A∆h (negative sign indicates volume loss). The volume
of water flowing through the orifice in time ∆t is the volume contained in a
cylinder of cross section B and length ∆s. Thus we have
−A∆h = B∆s.
Dividing by ∆t and taking limit as ∆t → 0, we get

dh ds
−A =B (2.17)
dt dt
ds
where is the velocity of h water through the orifice. According to Torricelli’s
dt
law of hydrodynamics, the velocity of water issuing from the orifice:
ds p
= 2gh. (2.18)
dt
By (2.17) and (2.18) we get
dh Bp
=− 2gh. (2.19)
dt A
Differential Equation (2.18) represents the rate at which water level is drop-
ping.
2.2.8 Spring and Mass System

Let an external force f (t) act on a vibrating mass on a spring. This could
represent a driving force causing an oscillatory vertical motion of the support
of the spring, see Figure 2.5.
If f (t) is considered a part of the formulation of Newton’s law, the differ-
ential equation
d2 x dx
m 2 = −kx − β + f (t)
dt dt
where β is a positive damping constant and negative sign indicates that it
acts in a direction opposite to the motion.
d2 x dx
or + 2λ + ω 2 x = F (t) (2.20)
dt2 dt
f (t) B K
where F (t) = , 2λ = and ω 2 = , represents forced motion. It may
m m m
be
observed that (2.20) is a general form of the equation
d2 x
+ ω2 x = 0 (2.21)
dt2
describing simple harmonic motion or free undamped motion. x(t) =
2π
c1 cos ωt + c2 sin ωt is the general solution of (2.20). The period T = and
ω
ω
the frequency f = .
2π
~~~ l+s
unstretched _j
Equilibrium position motion
mg-ks =0
a b c
FIGURE 2.5: Spring and Mass System
2.2.9 Mixture of Salt and Payment of Loan

Let us assume that a mixing tank as shown in Figure 2.6 contains m gallons
of water in which salt has been dissolved. Suppose that another brine solution
is pumped into the tank at the rate of n gallons per minute and after proper
mixing, the mixture is pumped out at the same rate. The concentration of
salt in the inflow is p kg per gallon.
Let A(t) be the amount of salt measured in kg in the tank at any time. It
is assumed that salt is neither created nor destroyed in the tank. Therefore,
variations in the amount of salt are due solely to the flows in and out of the
dA(t)
tank. Therefore, the rate of change of salt in the tank, , is equal to the
dt
rate at which salt is flowing in minus the rate at which it is flowing out, more
precisely
dA(t)
= (Rate of substance entering in)-(Rate of flowing out )
dt
= Rin − Rout . (2.22)
The rate at which the salt is entering into the tank in kg per minute = np.
The rate at which the salt is flowing out from the tank in kg per minute =
Const ant 300 gal
FIGURE 2.6: Mixture of Salt
n
A(t). Thus we obtain
m
dA(t) n
= np − A(t). (2.23)
dt m
Differential Equation (2.23) models concentration of a chemical mixture and
its solution provides the amount of salt at any time t. Concrete examples and
their solution will be presented in Section 2.6.9.
Remark 20. The fundamental balance Equation (2.22) or differential Equa-
tion (2.23) can be applied in different situations. This model can be used to
determine pollutant in a lake or a drug in an organ of the body, monthly
payment of a loan, see Section 2.6.9.
2.2.10 Predator-Prey Model

Suppose two different species of animals interact within the same environ-
ment, and assume further that the first species eats only vegetation and the
second species eats the first. First species is called a predator and the second
one is prey. For example, assume predators are foxes and the prey are rabbits.
Let x(t) and y(t) denote, respectively, the fox and rabbit population at time t.
If there were no rabbits, it would be expected that fox numbers would decline
due to lack of an adequate food supply according to
dx
= −αx, α > 0. (2.24)
dt
Assume rabbits are present in the environment and encounters between these
two species are proportional to the product x(t)y(t). When rabbits are present
there is a supply of food and so foxes are added to the system at a rate
βxy, β > 0. Adding this rate to (2.24), we get a model for the fox population:
dx
= −αx + βxy. (2.25)
dt
On the other hand, if there are no foxes, the rabbit population will grow at a
rate proportional to the number of rabbits present at time t:
dy
= µy, µ > 0. (2.26)
dt
If foxes are present, a model for the rabbit population is (2.26) decreased by
λxy, λ > 0, the rate at which the rabbits are eaten during their encounters
with the foxes:
dy
= µy − λxy. (2.27)
dt
Equations (2.25) and (2.27) constitute a system of non-linear differential equa-
tions
dx
= −αx + βxy
dt
dy
= µy − λxy (2.28)
dx
where α, β, µ and λ are positive constants and (2.28) is called the Lotka-
Volterra predator-prey model.
Remark 21. Except for two constant solutions, x(t) = 0, y(t) = 0 and
µ α
x(t) = , y(t) = , the non-linear system (2.28) cannot be solved in terms
λ β
of elementary functions.
Interested readers may find comprehensive discussion on systems of equa-

tions (2.28) in Brenan and Boyce [5] and Edwards and Penney [7].
2.2.11 Model of Groundwater Contaminant Source

Chlorinated solvents are common causes of environmental contamination
at industrial plants. Chlorinated organics, collectively referred to as dense
nonaqueous phase liquids (DNAPLS), are denser than water. DNAPLS tend
to accumulate as a separate phase below the water table and cause long term
ground water contamination.
Let As = cross sectional area of the source region, vd = Darcy groundwater
velocity, mt = total DNAPL mass in source region, cs (t) = concentration
(flow averaged) of dissolved contaminant leaving the source region, m0 =
initial DNAPL mass in source region and c0 = source zone concentration
corresponding to an initial source zone mass of m0 .
The equation describing the rate of DNAPL mass discharge from the source
region
dm
= −Asvd cs (t) (2.29)
dt
while an algebraic relationship between cs (t) and m(t) is assumed in the form
of a power law,
γ
cs (t) m(t)
= (2.30)
c0 m0
where γ is empirically determined. (2.29) and (2.30) yield a differential equa-
tion
dm
= −αmγ (2.31)
dt
that models the dissolution of DNAPL into the ground water flowing through
the source region. For more details see Falta, Rao and Basu in [5].
2.2.12 Heart Pacemaker

Consider a heart pacemaker as shown in Figure 2.7 consisting of a battery,
capacitor, and the heart as resistor. When the switch is at P , the capacitor
charges; when S is at Q, the capacitor discharges, sending an electrical stim-
ulus to the heart. During this period, the voltage E applied to the heart is
given by the differential equation
dE 1
= E, t1 < t < t2 , (2.32)
dt RC
where R and C are constants.
Solution of this equation gives voltage applied to heart.
(t1 − t)
E(t) = E0 e RC ,
where E0 = E(0).
2.2.13 X-Ray and Beer’s Law

Beer’s law describes the behavior of X-rays, that is, the rate of change of
intensity per millimeter of a non-refractive, monochromatic, zero-width X-ray
beam passing through a medium is jointly proportional to the intensity of the
beam and to the attenuation coefficient of the medium. This can be expressed
by the differential equation
dI
= −A(x)I(x) (2.33)
dx
heart
Q !J
switch
p s c
FIGURE 2.7: Model of Pacemaker
where the intensity of the X-ray beam I(x) = E.N (x), E is the energy level of
each photon, N (x) denotes the number of photons per second passing through
a point x and A(x) is attenuation coefficient of the medium through which
X-ray is passing.
We assume that an X-ray beam is composed of photons and that each
photon has the same energy level E. Furthermore every substance through
which an X-ray passes has the property that each millimeter of the substance
absorbs a certain portion of the photons through it. This proportion which is
specific to each substance, is called the attenuation coefficient of that material.
2.2.14 Model of Spreading Information

Let M (t) be the population in which we want to propagate information.
Let N (t) be the number of persons who know particular information at time
t. We assume that those who know the information spread it randomly in the
population and those who are provided the information become propagators
of the information. Furthermore it is assumed that each person having the
information gives it to k individuals per unit of time. It is just possible that
some of these k persons may already have the information. Over a unit of
time, each of N (t) persons will provide information to k persons. Therefore,

the total number of persons who are given the information over the unit of
time is N (t)k. We are interested only in new persons to whom information is
provided. The proportion of the total population M that is unaware of the
M −N
information over the unit of time is . Hence, the total number of new
M
persons who are given the information is

M −N k
N (t)k = N (t)(M (t) − N (t)).
M M (t)
Therefore
dN k
= N (M − N )
dt M
is a model for propagation of information among a population of size M (t).
2.2.15 Model for Circulation of Money

Suppose that a country has U.S. $ 200 billion of paper money in circula-
tion. Each week $ 50 million is brought into the banks for deposit, and the
same amount is paid out. The government decides to issue new paper money
whenever old money comes to the banks. Old money is destroyed and replaced
by new money. Let A(t) be the amount of old money(in millions of U.S. dol-
lars) in circulation at time t (in weeks). Then this process is modeled by the
dA
= −0.025A.
dt
Solution of this equation provides us the amount of the old money backed
to the banks in a certain time. With the help of this model we can roughly
determine the week in which 95% old paper money will be deposited to the
banks.
2.3 MATLAB and MATHEMATICA for Differential

Equations
2.3.1 Solving Ordinary Differential Equations (ODEs) by
MATLAB
The differential equation is a very important part of calculus as it has a role
in applied areas of mathematics; one of them is modeling of physical systems.
In MATLAB, we can solve differential equations symbolically or numerically.
First we start with symbolic solution of ODEs.
Symbolic Solution:
The function dsolve computes symbolic solutions to ordinary differential
equations (Figure 2.8 and 2.9). The equations are specified by symbolic
expressions containing the letter D to denote differentiation. The symbols
D2, D3, . . . , DN , correspond to the second, third,. . . , N th derivatives respec-
tively. Thus, D2y is equivalent to d2 y/dt2 . We can specify the initial conditions
dx
also. For example, let us solve = x + t with initial condition x(0) = 0. We
dt
can then write following code
>> x= dsolve(‘Dx=x+t’, ‘x(0)=0’)

x =
0-1-t+exp(t)
We can also plot this solution with the ezplot function.
We can solve higher order differential equations also. The differential equa-
-1-t+exp(t)
200
180
160
140
120
100
80
60
40
20
-6 -4 -2
FIGURE 2.8: Solution of ODE
d2 y
tion to be solved is = cos 2t − y, with initial conditions y(0) = 1 and
dt2
dy(0)
= 0. To solve it write the following code
dt
>> y = dsolve(‘D2y=\cos(2*t)-y’, ‘y(0)=1’,‘Dy(0)=0’)
y =
4/3*\cos(t)-1/3*\cos(2*t)
TABLE 2.1
Solver Solves These Kinds of Problems Method

ode45 Nonstiff differential equations Runge-Kutta
ode23 Nonstiff differential equations Runge-Kutta
ode113 Nonstiff differential equations Adams
ode15s Stiff differential equations and DAEs NDFs (BDFs)
ode23s Stiff differential equations Rosenbrock
ode23t Moderately stiff differential equations Trapezoidal rule
and DAEs
ode23tb Stiff differential equations TR-BDF2
ode15i Fully implicit differential equations BDFs
Numerical Solution:
MATLAB has several built-in functions to solve various types of ODEs nu-
merically. The detail is given below.
We will use one simple example of solve ode45(). The general syntax is given
as
>>[time, solution] = ode 45(’myfunction’, tspan, xo),
where myfunction is the ODE. In the syntax, tspan is the time interval for
which we want a solution and xo is the initial condition. Let us use same ODE,
which we used above for a symbolic solution. This ODE can be solved in two
ways by writing a function file describing an ODE or by using inline function.
The code is given below:
>> myode=inline(‘x+t’);
>> [t,x]=ode45(myode, [0 2], 0);
>> plot (t,x)
We can code in alternate form by writing a function file. The code is given
below.
% function file saved as myode.m
function dxdt= myode (t,x)
dx dt=x+t;
On command prompt, we will write the following:
>> [t,x]=ode45(‘myode’, [0 2], 0);
>> plot (t,x)
We can solve higher order ODEs also. We must represent them as a set of
d2 y dy
first order ODEs. For example, let us solve 2 − (1 − y 2 ) + y = 0 with
dt dt
dy(0)
initial condition y(0) = 2 and = 0. Before writing, we have to express
dt
second order ode into set of first order ode. The ode can also be written as
ÿ = (1 − y 2 )ẏ − y . . . (1). Let y = x1 and x2 = x1.˙ . . (2). Then equation in y
˙ 2
can be written as x2 = (1 − x1 )x2 − x1 . . . (3).
The equations (2) and (3) can be written as
4.5
3.5
2.5
1.5
05
0
0 0.2 0.8 1.2 1.4 1.6 1.8
FIGURE 2.9: Numerical Solution of ODE

ẋ1 x2
= . . . (4),
ẋ2 (1 − x12 )x2 − x1
and (4) is of the form ẋ = f (x).
Now we write a function file which will describe Equation (4). The code is
given below:
function xdot = myode(t,x)
xdot = [x(2); (1-x(1)^2)*x(2)-x(1)];
Then write the following commands on the command prompt:
>> [t,x]=ode45(‘myode’,[0 20], [2; 0]);
>> plot(t,x)
2.3.2 Solving Differential Equation by MATHEMATICA

Solving Differential Equations Symbolically
The MATHEMATICA command DSolve[eqn,y; x] solves the differential
equation for the function y(x). The command DSolve[{eqn1 , eqn2 ,. . . },
{y1 , y2 , . . . , x}] solves a system of differential equations. The command
DSolve[eqn, y; {x1 , x2 , . . . }] solves a partial differential equation for the func-
tion y = y(x1 , x2 , . . . ).
Example 91. Find the general solution of y 00 (x) + y(x) = ex .
In[ ]:=DSolve[y’’[x] + y[x] == Exp[x]; y[x]; x]
Out[ ] = {{y[x]- > c_1 cos x + c_2 sin x
+ \frac{1}{2}e^xcos^2 x+sin^2 x)}}
·1
10 12 14 16 18 20
I
FIGURE 2.10: Symbolic Solution of ODE by MATHEMATICA
Example 92. Find the general solution of y 00 (x) + y(x) = ex subject to the
boundary conditions y(0) = 0, y(0) = 1.
In[ ]:=DSolve[{y[x] + y[x] == Exp[x], y[0] == 1, y[0] == 1},
y, x]
Out[ ] = {{y - > Function[{x}; \frac{1}{2}(cos x + e^x cos^2 x
+ sin x + e^x sin^2 x)]}}
Solving Differential Equations Numerically

The MATHEMATICA command NDSolve[eqns, y, {x, a, b}] finds a numeri-
cal solution to the ordinary differential equations eqns for the function y(x)
in the interval [a, b]. The command NDSolve[eqns, z, {x, a, b}; {t, c, d}] finds a
numerical solution of the partial differential equation eqns.
Example 93. Solve numerically the initial value problem (ordinary differen-
tial equation
y 0 (t) = y cos(t2 + y 2 ), y(0) = 1.
In[ ]:=nsol+NDSolve[{y’[t]==y[t]cos[t^2+y[t]^2],
y(0)==1},y,{t,0,20}]
Out[ ]={{y -> InterpolatingFunction [{{0,20}},<>]}}
We plot this equation with
In[ ]:= Plot [Evaluate[y[t]/.nsol],{t,0,20},
Ticks -> {{0,5,10,15,20},{0,1}}, PlotRange -> All,

AxesLabel -> {x,t}]
!IIV.VAWM
5 10 15 20 X
FIGURE 2.11: Numerical Solution of ODE by MATHEMATICA
2.4 Methods for Solving First Order Linear Differential

Equations
2.4.1 Method of Separation of Variables
Definition 40. A first order differential equation of the form
dy
= g(x)h(y),
dx
where g(x) and h(y) are functions of x and y only is called separable or said
to have separable variables.
Procedure for solving separable differential equations

(i) If h(y) = 1, then

dy
= g(x)
dx
or dy = g(x)dx.
Integrating both sides we get
Z Z
dy = g(x)d(x) + c
Z
or y= g(x)d(x) + c
where c is the constant of integration.

We can write
y = G(x) + c
where G(x) is an anti-derivative (indefinite integral) of g(x).
dy
(ii) Let = f (x, y), where f (x, y) = g(x)h(y) that is f (x, y) can be written
dx
as the product of the two functions, one function of variable x only and other
variable only y.
dy
= g(x)h(y)
dx
can be written as
1
dy = g(x)dx.
h(y)
By integrating both sides we get
Z Z
ρ(y)dy = g(x)dx + C
1
where ρ(y) = or H(y) = G(x) + C, where H(y) and G(x) are anti-
h(y)
1
derivatives of p(y) = and g(x), respectively.
h(y)
Example 94. Solve the differential equation
y 0 = y/x.
1 1
Solution: Here g(x) = , h(y) = y and p(y) = and G(x) = ln x and
x y
H(y) = ln y.
Hence H(y) = G(x) + C
or ln y = ln x + ln c
ln y − ln x = ln c (See Appendix A for properties of ln x)
y
= c
x
y = cx.
Example 95. Solve the initial value problem

dy x
= − , y(4) = 3.
dx y
Solution: g(x) = x, h(y) = −1/y, p(y) = −y
H(y) = G(x) + c
1 1 2
− y2 = x +c
2 2
y 2 = −x2 − 2c21
where c21 = −2c. By the initial value condition,
16 + 9 = c21
or c1 = ±5
or x2 + y2 = 25.
Thus the initial value problem determines
x2 + y 2 = 25.
dy
= cos 7x.
dx
Solution: dy = cos 7xdx. Integrating both sides, we get
Z Z
dy = cos 7xdx + c
sin 7x
y= + c.
7
2.4.2 Linear Equations

Definition 41. A first order differential equation of the form
dy
a1 (x) + a0 (x)y = g(x)
dx
is called a linear equation.
If a1 (x) 6= 0, we can write this differential equation in the form
dy
+ P (x)y = f (x) (2.34)
dx
a0 (x) g(x)
where P (x) = , f (x) = .
a1 (x) a1 (x)
(2.34) is called the standard form of a linear differential equation of the first
order.
R
Definition 42. e P (x)dx is called the integrating factor of the standard
form of a linear differential Equation (2.34).
Procedure of Solution (2.34):
Step 1: Write the given equation in the standard form (2.34) if it is not
already in this form. R
Step 2: Identify P (x) and compute the integrating factor I(x) = e P (x)dx .
Step 3: Multiply the standard form by I(x).
Step 4: The solution is
Z
y.I(x) = f (x).I(x)dx + c. (2.35)
Example 97. Find the general solution of the following differential equations:
dy dy dy
(a) = 9y (b) x +2y = 3 (c) x +(3x+1)y = e−3x
dx dx dx
dy
Solution: (a) − 9y = 0.
dx
P (x) = −9.
R
Integrating function = I(x) = e 9dx
= e−9x ,
Z
y.e−9x = 0.e−9x dx + c
or y = ce9x , ∞ < x < ∞.

dy 2 3
(b) + y= .
dx x x R
Integrating factor = I(x) = e P (x)dx ,
2
where P(x) =
x
R 2
I(x) = e x dx = x2 .
Solution is given by Z
y.I(x) = f (x).I(x)dx + c
3
where I(x) = x2 , f (x) = . Thus
x
Z
3 2
yx2 = x dx + c
x
Z
3
= 3xdx + c = x2 + c
2
3 c
or y = + , 0 < x < ∞.
2 x2
(c) Standard form is
e−3x

dy 1
= 3+ y=
dx x x
1 e−3x
P (x) = 3 + , f (x) = .
x x
R
P (x)dx
Integrating factor = I(x) = e
1
!
R
3+
=e x dx = xe3x
e−3x 3x
Z
3x
y.xe = .xe dx + c
x
Z
= e0 dx + c = x + c
c −3x
or y = e−3x + e , for 0 < x < ∞.
x
One can find a particular solution of Equation (2.34) by a procedure known
as variation of parameter (for higher order linear differential equations, this
method is discussedRin Section 2.5.7). The basic idea is to find a function u
so that yp = u(x)e− P (x)dx is a solution of (2.34). Substituting yp into (2.34)
yields
dy1 du
u + P (x)y1 + y1 = f (x),
dx dx
R
where y1 = e− P (x)dx
, so that

du dy1
y1 = f (x) as + P (x)y1 = 0.
dx dx
This gives us
Z Z
f (x) f (x) R
du = dx and u= dx = f (x)e P (x)dx .
y1 (x) y1 (x)
R R
Therefore, yp = e− P (x)dx e P (x)dx f (x)dx.
R
R dy
yc = ce− P (x)dx is a general solution of + P (x)y = 0.
dx
Hence
R
Z R
y = yc + yp = ce−P (x)dx + e− P (x)dx e P (x)dx f (x)dx. (2.36)
If (2.34) has a solution then it must be of the form (2.36).

2.4.3 Exact Equations

We consider here a special kind of non-separable differential equation called
an exact differential equation. We recall that the total differential of a
function of two variables U (x, y) is given by
∂U ∂U
dU = dx + dy. (2.37)
∂x ∂y
Definition 43. The first order differential equation
M (x, y)dx + N (x, y)dy = 0 (2.38)
is called an exact differential equation if the left hand side of (2.38) is the
total differential of some function U (x, y).
Remark 22. (a) It is clear that a differential equation of the form (2.38) is
exact if there is a function of two variables U (x, y) such that
∂U ∂U
dU = dx + dy
∂x ∂y
∂U ∂U
or = M (x, y), = N (x, y).
∂x ∂y
(b) Let M (x, y) and N (x, y) be continuous and have continuous first deriva-
tives in a rectangular region R defined by a < x < b, c < y < d. Then
a necessary and sufficient condition that M (x, y)dx + N (x, y)dy is an exact
differential is that
∂M ∂N
= or My = Nx . (2.39)
∂y ∂x
(c) A linear differential equation of the first order can be made exact by
multiplying with the integrating factor,
My − Nx
R
dx
I(x) = e N
R My − Nx
dy
I(y) = e M .
Procedure of Solution 2.38:
Step 1: Check whether a differential equation written in the form (2.38)
satisfies (2.39).
Step 2: If Equation (2.39) is satisfied then there exists a function f for which
∂f
= M (x, y). (2.40)
∂x
Integrating (2.40) with respect to x, while holding y constant, we get
Z
f (x, y) = M (x, y)dx + g(y) (2.41)
where the arbitrary function g(y) is a constant of integration.

∂f
Step 3: Differentiate (2.41) with respect to y and assume = N (x, y) and
∂y
we get Z
∂f ∂
= M (x, y)dx + g 0 (y) = N (x, y).
∂y ∂y
or Z
0 ∂
g (y) = N (x, y) − M (x, y)dx. (2.42)
∂y
Step 4: Integrate (2.40) with respect to y and substitute this value in (2.41)
to obtain f (x, y) = c, the solution of the given equation.
Remark 23. (a) Right hand side of (2.42) is independent of variable x, be-
cause
Z Z
∂ ∂ ∂N ∂ ∂
N (x, y) − M (x, y)dx = − M (x, y)dx
∂x ∂y ∂x ∂y ∂x
∂N ∂M
= − = 0.
∂x ∂y
(b) We could just start the above mentioned procedure with the assumption
that
∂f
= N (x, y)
∂y
By integrating N (x, y) with respect to y and differentiating the resultant
expression, we would find the analogues of (2.41) and (2.42) to be, respectively,
Z
f (x, y) = N (x, y)dy + h(x) and
Z
∂
h(x) = M (x, y) − N (x, y)dy.
∂x
Example 98. Check whether x2 y 3 dx + x3 y 2 dy = 0 is exact.
∂M ∂N
Solution: In view of Remark 22(b) we must check whether = , where
∂y ∂x
M (x, y) = x2 y 3 , N (x, y) = x3 y 2
∂M ∂N
= 3x2 y 2 , = 3x2 y 2 .
∂y ∂x
∂M ∂N
This shows that = 3x2 y 2 = . Hence the given equation is exact.
∂y ∂x
Example 99. Determine whether the following differential equations are ex-
act. If they are exact solve them by the procedure given in this section.
(a) (2x − 1)dx + (3y + 7)dy.
(b) (2x + y)dx − (x + 6y)dy = 0.

(c) (3x2 y + ey )dx + (x3 + xey − 2y)dy = 0.
Solution: (a) M (x, y) = 2x − 1, N (x, y) = 3y + 7.
∂M ∂N
= 0, = 0.
∂y ∂x
∂M ∂N
Thus = and so the given equation is exact.
∂y ∂x
Apply procedure of (2.38) for finding solution.
∂f
Put = 2x − 1. Integrating and choosing h(y) as a constant of integration
∂x
we get Z
∂f
= f (x, y) = x2 − x + h(y),
∂x
h0 (y) = N (x, y) = 3y + 7, and by integrating with respect to y we obtain
3 2
h(y) = y + 7y.
2
The solution is
3
f (x, y) = x2 − x + y 2 + 7y = c.
2
(b) It is not exact as
M (x, y) = 2x + y, N (x, y) = −x − 6y
∂M ∂N
and = 1 6= = −1.
∂y ∂x
(c) M (x, y) = 3x2 y + ey , N (x, y) = x3 + xey − 2y,
∂M
= 3x2 + ey
∂y
∂N
= 3x2 + ey .
∂x
∂M ∂N
Thus = , the equation is exact.
∂y ∂x
∂f
Applying procedure for solving 2.38 and let = 3x2 y + ey . Integrating with
∂x
respect to x, we obtain
f (x, y) = x3 y + xey + g(y)

where g(y) is a constant of integration.

Differentiating with respect to y we obtain
∂f
= x3 + xey + g 0 (y).
∂y
∂f
This gives N (x, y) = = x3 + xey + g(y)
∂y
or g 0 (y) = −2y
or g(y) = −y 2 .
Substituting this value of g(y) we get
f (x, y) = x3 y + xey − y 2 = c.
Thus x3 y + xey − y 2 − c is the solution of the given differential equation.
2.5 Methods for Solving Higher Order Differential Equa-

tions
In this section we discuss the solution of differential equations of order two
or more. In the following subsections underlying theory and certain important
methods are presented. The main goal is to find general solutions of higher
order linear differential equations.
2.5.1 Initial Value and Boundary Value Problems

Initial value Problem: For a linear differential equation the following
problem is called an nth order initial value problem.
Find a solution of the differential equation
dn y dn−1 y
an (x) n
+ an−1 (x) n−1 + · · · + a0 (x)y = g(x) (2.43)
dx dx
subject to
y(x0 ) = y0 , y 0 (x0 ) = y1 , . . . , y (n−1) (x0 ) = yn−1 . (2.44)
Conditions given by (2.44) are called n initial conditions. The following theo-
rem proves existence and uniqueness of solutions of initial value problems.
Theorem 16. (Existence and Uniqueness of Solutions).
Let an (x), an−1 (x), . . . , a1 (x), a0 (x) and g(x) be continuous on an interval I,
and let an (x) 6= 0 for every x in I. If x 6= x0 is any point in I, then there
exists a unique solution y(x) of the initial value problems (2.54) and (2.44)
on the interval I.
Example 100. The initial value problem
d3 y d2 y
6 + 10 − 8y = 0
dx3 dx2
y(1) = 0, y 0 (1) = 0, y 00 (1) = 0
possesses the trivial solution y = 0. Since the third order equation is linear
with constant coefficients, it follows that all conditions of Theorem 16 are
satisfied. Hence y = 0 is the only solution on any interval containing x = 1.
Example 101. Check whether the function y = 3e2x + e−2x − 3x is a solution
of the initial value problem
d2 y
− 4y − 12x = 0
dx2
y(0) = 4, y 0 (0) = 1.
Here a2 (x) = 1 6= 0, a1 (x) = −4 6= 0 for every interval containing
x = 0, g(x) = 12x. a2 (x), a1 (x) and g(x) are continuous on any interval I
containing x = 0. y = 3e2x + e−2x − 3x is a solution of the initial value prob-
lem on any interval I containing x = 0 by Theorem 16. It is also a unique
solution by the same theorem.
Boundary Value Problem

A boundary value problem consists of solving a linear differential equation
of order two or greater in which dependent variable y or its derivatives are
specified at different points. A typical boundary value problem (BVP) is to
solve the linear differential equation of order two.
d2 y dy
a2 (x) + a1 (x) + a0 (x)y = g(x) (2.45)
dx2 dx
subject to
y(a) = y0 , y(b) = y1 . (2.46)
Conditions in (2.46) are called boundary conditions.
Remark 24. (i) A solution of BVP [(2.45) and (2.46)] is a function φ(x)
satisfying the differential Equation (2.45) on some interval I, containing a
and b, whose graph passes through the two points (a, y0 ) and (b, y1 ).
(ii) For a second order differential equation, other pairs of boundary conditions
could be
y 0 (a) = y0 , y(b) = y1 ,
y(a) = y0 , y 0 (b) = y1 ,
y 0 (a) = y0 , y 0 (b) = y1 .
Example 102. Show that the following boundary value problem
d2 y
+ 16y = 0
dx2
π
y(0) = 0, y( ) = 0
2
Solution: It can be checked that
y = c1 cos 4x + c2 sin 4x
is a solution of the equations
d2 y
+ 16y = 0.
dx2
y(0) = 0 = c1 cos 4.0 + c2 sin 4.0 or c1 = 0.
π π
y = 0 = c2 sin 4 or c2 sin 2π = 0, for any choice c2 .
2 2
Hence the boundary value problem
d2 y
+ 16y = 0
dx2
π
y(0) = 0, y( ) = 0
2
2.5.2 Homogeneous Equations

A linear nth-order differential equation of the form
dn y dn−1 y dy
an (x) + a n−1 (x) + · · · + a1 (x) + a0 (x)y = g(x) (2.47)
dxn dxn−1 dx
where g(x) 6= 0, (g(x) is not identically zero) is called a non-homogeneous
nth order differential equation. If g(x) = 0, then
dn y dn−1 y dy
an (x) + an−1 (x) + · · · + a1 (x) + a0 (x)y = 0 (2.48)
dxn dxn−1 dx
is called a homogeneous linear differential equation of nth order. We shall
see in the latter part of this chapter that in order to solve a non homoge-
neous differential Equation (2.47), we must be able to solve the associated
homogeneous Equation (2.48). Here we discuss the general solution of (2.48).
Throughout the discussion we assume that
• ai (x), i = 0, 1, 2, . . . , n are continuous.
• g(x) = 0 as the case for homogeneous or continuous equations.

• an (x) 6= 0 for every x on which solution is considered.
Differential Operator
dy
Let = Dy. The symbol D is said to be a differential operator as it
dx
transforms a differentiable
function into another function.
d2 y

d dy
We can write = = D(Dy) = D2 (y), that is acting (operating)
dx2 dx dx
twice on y.
Continuing this process we can write
dn y
= D(Dn−1 y) = D(D(Dn−2 )).
dxn
We define an nth order differential operator to be
L = an (x)Dn + an−1 (x)Dn−1 + · · · + a1 (x)D + a0 (x). (2.49)
By the two basic properties of differentiation, we have
D(αf (x)) = αDf (x),
where α is a constant, and
D{f (x) + g(x)} = Df (x) + Dg(x).
This means that the differential operator L possesses a linearity property; that
is, L operating on a linear combination of two differentiable functions is the
same as the linear combination of L operating on the individual functions. In
symbol language, this means
L{αf (x) + βg(x)} = αL(f (x)) + βL(g(x)) (2.50)
where α and β are constants. In view of (2.50) L is called a linear operator.
Homogeneous Equation (2.48) can be expressed in terms of the D notion as
L(y) = 0, where L is given by (2.49). So, (2.47) can be written as
L(y) = g(x).
Superposition Principle
The following theorem tells us that the sum or superposition of two or more
solutions of (2.48) is also a solution of (2.48).
Theorem 17. Let y1 , y2 , y3 , . . . , yn be a solution of (2.48) on an interval I.
Then the linear combination
y = α1 y1 + α2 y2 + · · · + αn yn ,
where αi , i = 1, 2, 3 . . . , n, are arbitrary constants, is also a solution of (2.48)
on I.
Linear Dependence and Linear Independence

Definition 44. A set of functions f1 (x), f2 (x), . . . , fn (x) is said to be linearly
independent on an interval I if the only constants for which
α1 yf1 (x) + α2 f2 (x) + · · · + αn fn (x) = 0
for every x in the interval are α1 = α2 = · · · = αn = 0.
A set of functions which is not linearly independent is called linearly
dependent.
Remark 25. An equivalent formulation for linearly dependent set: A set of
functions f1 (x), f2 (x), . . . , fn (x) are linearly dependent on an interval I if
there exist constants α1 , α2 , . . . , αn not all zero, such that
α1 f1 (x) + α2 f2 (x) + · · · + αn fn (x) = 0
for every x in the interval.
Let the set consist of two functions only, say f1 (x) and f2 (x). Therefore,
α2
assuming α1 6= 0, f1 (x) = − f2 (x), that is, f1 (x) is a constant multiple of
α1
f2 (x).
Thus if the set of two functions is linearly dependent, then one must be
a constant multiple of the other. Conversely, let f1 (x) = α2 f2 (x) for some
constant α2 . Then −f1 (x) + α2 f2 (x) = 0 for every x in the interval.
Hence the set of two functions is linearly dependent when neither function
is a constant, namely α1 = −1 is not zero. We conclude that a set of two
functions is linearly independent when neither function is a constant multiple
of the other on the interval.
Example 103. (i) Let f1 (x) = sin 2x, f2 (x) = sin x cos x. The set of f1 (x)
and f2 (x) is linearly dependent on (−∞, ∞) as f1 (x) is a constant multiple
of f2 (x) : f1 (x) = sin 2x = 2 sin x cos x on (−∞, ∞).
(ii) Let f1 (x) = ex , f2 (x) = 5ex . The set {f1 (x), f2 (x)} is linearly dependent.
(iii) Let f1 (x) = 2 + x, f2 (x) = 2 + |x|.
{f1 (x), f2 (x)} is linearly independent as f1 (x) and f2 (x) cannot be multiples
of each other.
Now we mention results characterizing linearly independent solutions of
(2.48) in terms of Wronskian determinants.
Definition 45. Let each of the functions f1 (x), f2 (x), . . . , fn (x) possess at
least n − 1 derivatives. The determinant
· · · ·

f1 f2 fn
f10 f20 · · · · fn0

W (f1 , f2 , . . . , fn ) = .. .. .. .. .. .. (2.51)

(n−1). . . . . .

(n−1) (n−1)
f
1 f2 · · · · fn
where the primes denote derivatives is called the Wronskian of the functions.
Theorem 18. Let y1 , y2 , . . . , yn be n solutions of (2.48) on an interval

I. Then the set of solutions is linearly independent on I if and only if
W (y1 , y2 , . . . , yn ) 6= 0 for every x in the interval.
Definition 46. (Fundamental set of solutions). Any set y1 , y2 , . . . , yn of n lin-
early independent solution (2.48) on an interval I is said to be a fundamental
set of solutions on the interval.
Theorem 19. (Existence of a fundamental set) There exists a fundamental

set of solutions of (2.48) on an interval I.
Theorem 20. (General solution) Let y1 , y2 , . . . , yn be a fundamental set of
solutions of (2.48) on an interval I. Then
y = α1 y1 (x) + α2 y2 (x) + · · · + αn yn (x)
where αi , i = 1, 2, . . . , n are arbitrary constants, is also called general solu-

tion of (2.48).
Remark 26. Theorem 20 states that for any solution y(x) of (2.48) on an
interval I, c1 , c2 , . . . , cn can be found such that
y(x) = c1 y1 (x) + c2 y2 (x) + · · · + cn yn (x).
Example 104. The set consisting of e−3x and e4x is a fundamental set of
solutions of the differential equation y 00 − y 0 − 12y = 0 on (−∞, ∞). y = e−3x
is a solution of the given differential equation, that is, y 00 − y 0 − 12y = 9e−3x +
3e−3x − 12e−3x = 0.
y = e4x is also a solution of the given differential equation, that is,
y 00 − y 0 − 12y = 16e4x − 4e4x − 12e4x = 0.
e−3x
The set of {e−3x , e4x } is linearly independent as 4x = e−7x is a function
e
and not a constant. In other words neither is a constant multiple of the other
and {e−3x , e4x } is independent. Also,
e−3x

e4x
W (y1 , y2 ) = = 4ex + 3ex = 7ex 6= 0.
−3e−3x 4e4x
Therefore, {e−3x , e4x } is a fundamental set of solutions on interval (−∞.∞).
2.5.3 Non-homogeneous Equations

Any function yp , free of arbitrary parameters, that satisfies (2.47) is said
to be a particular solution or particular integral of the equation.
Theorem 21. Let yp be any particular solution of the non-homogeneous linear

nth order differential Equation (2.47) on an interval I, and let y1 , y2 , . . . , yn
be a fundamental set of solutions of the associated homogeneous differential
Equation (2.48) on I. Then the general solution of the equation on the
interval I is
y = c1 y1 (x) + c2 , y2 (x) + · · · + cn yn (x) + yp (2.52)
where ci , i = 1, 2, . . . , n are arbitrary constants.
The linear combination yc (x) = c1 y1 (x) + c2 , y2 (x) + · · · + cn yn (x), which
is the general solution of (2.48) is called a complementary function for
non-homogeneous differential Equation (2.47). Thus, in order to solve (2.47)
we first solve associated homogeneous linear differential Equation (2.48) and
then find a particular solution of (2.47). The general solution of (2.48) is
y = complementary function + any particular solution

= yc + yp . (2.53)
Example 105. y = c1 e2x + c2 e5x + 6ex is the general solution of the non-
homogeneous differential equation
y 00 − 7y 0 + 10y = 24ex on (−∞, ∞).
Verification: We are required to check that yc (x) = c1 e2x + c2 e5x is the

general solution of y 00 − 7y + 10y = 0 and y = 6ex is a particular solution of
y 00 − 7y 0 + 10y = 24ex .
We have
yc0 (x) = 2c1 e2x + 5c2 e5x
yc00 (x) = 4c1 e2x + 25c2 e5x
y 00 − 7y 0 + 10y = (4c1 e2x + 25c2 e5x ) − 7(2c1 e2x + 5c2 e5x )

+10c1 e2x + 10c2 e5x
= (14c1 e2x − 14c1 e2x ) + (35c2 e5x − 35c2 e5x )
= 0.
Thus, yc (x) is general solution of y 00 − 7y 0 + 10y = 0. We also have
y0 = 6e2x
y 00 = 6ex , so
y 00 − 7y 0 + 10y = 6ex − 42ex + 60ex = 24ex
that is, y = 6ex is a particular solution of y 00 − 7y 0 + 10y = 24ex .

2.5.4 Reduction of Order
Let a2 (x)y00 +a1 (x)y0 +a0 (x)y = 0 (2.54)

be a linear second order homogeneous differential equation. The main idea is to
discuss procedure to reduce (2.54) to a linear first order differential equation.
Theorem 22. If y1 is a nontrivial solution of the second order differential
Equation (2.54) then the substitution y2 (x) = y1 (x)u(x), followed by the sub-
stitution w(x) = u0 (x) reduces it to a first order linear differential equation.
Remark 27. (i) First order linear differential equation obtained Rin Theorem.
21 can be solved by computing an integrating factor I(x) = e P (x)dx (see
Section 2.4.2).
(ii) This procedure holds also for higher order linear differential equation.
Example 106. Let y1 be a solution of y 00 − y = 0 on the interval (−∞, ∞).
Use reduction of order to find a second solution y2 .
Verification: Let y2 (x) = y1 (x)u(x) = u(x)ex . Differentiating this product
function we get
y20 (x) = uex + u0 (x)ex (2.55)

y200 (x) = uex + u0 (x)ex + u0 (x)ex + u00 (x)ex . (2.56)
Therefore, y200 (x) − y2 (x) = ex (u00 + 2u0 ) = 0. Since ex 6= 0, we get
u00 + 2u0 = 0.
By substituting u0 = w in this equation we get
w0 + 2w = 0.
This
R is a linear first order differential equation. Applying integrating factor
e 2dx = e2 x, we can write
d 2x
e w = 0.
dx
By integrating we obtain
e2x w = c1 or w = u0 = c1 e−2x .
Integrating again with respect to x we get

−c1 −2x
u= e + c2 .
2
Thus

−c1 −2x
y2 (x) = u(x)ex = e + c2 ex
2
−c1 −x
= e + c2 ex .
2
By choosing c2 = 0 and c1 = −2, we get

y2 (x) = e−x .
e−x
x
e
Since W (y1 , y2 ) = x = −e0 −e0 = −2 6= 0 for every x ∈ (−∞, ∞),
e −e−x
the solutions are linearly independent in this interval.
2.5.5 Homogeneous Linear Equations with Constant Coeffi-

cients
We consider in this section equations of the type
dn y dn−1 y d2 y
an n
+ an−1 n−1 + · · · + a2 2 + a0 y = 0 (2.57)
dx dx dx
where the coefficients an , an−1 , . . . , a2 , a1 , a0 are real constants and an 6= 0.
We focus mainly on case n = 2; similar discussion is possible for other higher
numbers. It is interesting to note that all solutions of (2.57) for any n in
general and n = 2 in particular are exponential functions or constructed from
exponential functions.
Let us consider the special case n = 2 of (2.57) of the form
ay 00 + by 0 + cy = 0. (2.58)
If we try a solution of the form y = emx , then after substituting y 0 = memx
and y 00 = m2 emx Equation (2.58) gives us
am2 emx + bmemx + cemx = 0
or emx (am2 + bm + c) = 0.
Since emx 6= 0 for all x,
am2 + bm + c = 0, (2.59)
(2.59) is called the auxiliary equation.
Equation (2.58) can be satisfied by roots of (2.59):
√
−b + b2 − 4ac
m1 =
2a
√
−b − b2 − 4ac
m2 = .
2a
We know that (i) m1 and m2 are real and distinct if b2 − 4ac > 0. (ii) m1 and
m2 are real and equal if b2 − 4ac = 0. (iii) m1 and m2 are conjugate complex
numbers if b2 − 4ac < 0.
Case (i) Distinct Real Roots
Let m1 and m2 be two distinct real roots of (2.59). We find two solutions
y1 = em1 x and y2 = em2 x .
We can check that y1 and y2 are linearly independent on (−∞, ∞) and form
a fundamental set;
y = c1 e m 1 x + c2 e m 2 x (2.60)
is the general solution of (2.58).
Case (ii) Repeated Roots
If m1 = m2 , we obtain only one exponential solution y1 = em1 x . A second
solution is
Z 2m1 x
e
y2 = em1 x dx
e2m1 x
Z
= em1 x dx = xem1 x .
In this equation we have used −b/a = 2m1 . The general solution in this case
is
y = c1 em1 x + c2 xem2 x . (2.61)
Case (iii) Conjugate Complex Roots

If m1 and m2 are complex, then m1 = α + iβ and m2 = α − iβ, where α
and β are real and > 0. y1 = c1 e(α+iβ)x and y2 = c2 e(α−iβ)x are two linearly
independent solutions.
Thus y = y1 + y2 = c1 e(α+iβ)x + c2 e(α−iβ)x is the general solution of (2.58). y
is in complex form. By applying Euler’s formula
eiθ = cos θ + i sin θ
where θ is any real number, we write the general solution in real form. From
this formula it follows that
eiβx = cos βx + i sin βx
eiβx + e−iβx
cos β =
2
eiβx − e−iβx
sin β = .
2i
Since y = c1 e(α+iβ)x + c2 e(α−iβ)x is a solution of (2.58) for every choice of c1
and c2 , choices c1 = c2 = 1 and c1 = 1 and c2 = −1 give two solutions:
y3 = e(α+iβ)x + e(α−iβ)x
y4 = e(α+iβ)x − e(α−iβ)x .
But y3 = eαx (eiβx + e−iβx ) = 2eαx cos βx and y4 = eαx (eiβx − e−iβx ) =
2eαx sin βx.
By Theorem 16, eαx cos βx and eαx sin βx are real solutions of (2.58). More-
over, these solutions form a fundamental set on (−∞, ∞). Consequently the
general solution is
y = c1 eαx cos βx + c2 eαx sin βx = eαx (c1 cos βx + c2 sin βx). (2.62)
Example 107. Solve the following differential equations:

(i) 2y 00 − 5y − 3y = 0.
(ii) y 00 + 5y 0 − 6y = 0.
(iii) y 00 + 8y 0 + 16y = 0.
(iv) y 00 + 4y 0 + 7y = 0.
Solution: of (i) The auxiliary equation is 2m2 − 5m − 3 = 0 which can be
written as
(2m + 1)(m − 3) = 0.
−1
Therefore, two roots are m1 = , m2 = 3
2
The solution is of form (2.60), that is,
−1
y = c1 e 2 + c2 e3x .
(ii) The auxiliary equation is m2 + 5m − 6 = 0. This can be written in the

form
(m + 1)(m + 6) = 0.
Roots are m1 = 1, m2 = −6. Then the solution is of form (2.60), that is,
y = c1 ex + c2 e−6x .
(iii) The auxiliary equation is m2 + 8m + 16 = 0.

Roots are m1 = m2 = −4. The solution is of form (2.61), that is,
y = c1 em1 x + c2 xem1 x = c1 e−4x + c2 xe−4x .
(iv) The auxiliary equation is m2 + 4m + 7 =√ 0. √

Roots m1 and m2 are given by m1 = −2 + i 3, m2 = −2 − i 3.
The solution is of form (2.62), that is,
√ √
y = e−2x (c1 cos 3x + c2 sin 3x).
Example 108. Solve the initial value problem
y 00 + 3y 0 − 2y = 0.
y(0) = 1, y 0 (0) = 2.
Solution: The auxiliary equation is
m2 + 3m + 2 = 0.
Roots are m1 and m2 are
m1 = −1 and m2 = −2.
Therefore, the solution is of form (2.60), that is,
y = c1 e−x + c2 e−2x .
To find c1 and c2 we use initial conditions
y(0) = 1 and y 0 (0) = 2
y(0) = 1 = c1 e−0 + c2 e−0

or c1 + c2 = 1
y 0 = −c1 e−x − 2c2 e−2x
y 0 (0) = −c1 e−0 − 2c2 e−0 = −c1 − 2C − 2 = 2
or c1 + 2c2 = −2.
Thus c1 + c2 = 1 and c1 + 2c2 = −2.
This gives c2 = −3 and c1 = 4.
Therefore, y = 4e−x − 3e−2x .
Remark 28. In general, to solve an nth order differential Equation (2.57) we
must solve an nth degree polynomial equation:
an mn + an−1 mn−1 + · · · + a2 m2 + a1 m + a0 = 0.
If all roots (say m1 , m2 , . . . , mn ) of this equation are real and distinct, then
the general solution of (2.57) is
y = c1 em1 x + c2 em2 x + · · · + cn emn x .
It is difficult to summarize other two cases because the roots of any auxiliary
equation of degree greater than 2 can occur in many combinations.
2.5.6 Method of Undetermined Coefficients

To solve a homogeneous linear differential equation with constant coeffi-
cients
dn y dn−1 y d2 y dy
an n
+ an−1 n−1 + · · · + a2 2 + a1 + a0 y = g(x), (2.63)
dx dx dx dx
one must find complementary function yc , that is, the general solution of (2.57)
(see Theorem 21) and a particular solution of (2.63). A process of finding
a particular solution yp of (2.63) is known as the method of undetermined
coefficients. The underlying idea in this method is to guess about the form
of yp that is motivated by the form g(x) in (2.63). The method is limited to
those equations of the type (2.63), where g(x) is of the following forms:
(i) g(x) is constant.
(ii) g(x) is a polynomial function (function of form)
g(x) = a0 + a1 x + a2 x2 + · · · + an xn .
(iii) g(x) = eαx , exponential function.

(iv) g(x) = sin βx or cos βx or finite sum and products of these functions.
It may be observed that this method is not applicable in cases where

1
g(x) = ln x, g(x) = , g(x) = tan x, g(x) = sin−1 x etc.
x
The method is illustrated through the following examples.
Example 109. Find a general form of a particular solution yp for the follow-
ing equations
(a) 3y 00 + 2y = 5e2x + 2x3 .

(b) 3y 00 + 2y = x2 e−3x .
(c) 3y 00 + 2y = 20 sin 2x.
Solution: (a) The particular solution yp will be of the form
yp = Ae2x + B + Cx + Dx2 + Ex3 .
(b) The general form of yp will be of the form
yp = Ae−3x + Bxe−3x + Cx2 e−3x .
(c) yp = A sin 2x + B cos 2x.

Example 110. Find a particular solution yp of differential equation
y 00 − y 0 + y = 2 sin 3x.
Solution: A natural first guess for a particular solution would be A sin 3x.
Since successive differentiation of sin 3x produces cos 3x and sin 3x, we are
prompted instead to assume a particular solution that includes both of these
terms:
yp = A cos 3x + B sin 3x.
Differentiating yp and substituting the results into the differential equation
gives:
yp0 = −3A sin 3x + 3B cos 3x.
yp00 = −9A cos 3x − 9B sin 3x.
yp00 − yp0 + yp = −9A cos 3x − 9B sin 3x + 3A sin 3x − 3B cos 3x

+A cos 3x + B sin 3x (2.64)
= 2 sin 3x. (2.65)
yp00 − yp0 + yp = (−8A − 3B) cos 3x + (3A − 8B) sin 3x (2.66)
= 2 sin 3x + 0 cos 3x. (2.67)
Comparing coefficients of cos 3x and sin 3x we get
−8A − 3B = 0. (2.68)
3A − 8B = 2.
Solving for A and B we get

6 −16
A= , and B = .
73 73
Thus, a particular solution yp is given by
6 −16
yp = cos 3x − sin 3x.
73 73
Example 111. Find a particular solution of y 00 + 3y 0 + 2y = 5x2 .
Solution: We guess that yp is of the form
yp = A + Bx + Cx2 (2.69)
yp0 = B + 2Cx (2.70)
yp00 = 2C (2.71)
yp00 + 3yp0 + 2yp = 2
2C + 3B + 6Cx + 2A + 2Bx + 2Cx (2.72)
2
= 5x (2.73)
2 2
or (2C + 3B + 2A) + (6C + 2B)x + 2Cx = 0 + 0x + 5x . (2.74)
This implies that
2A + 3B + 2C = 0 (2.75)
2B + 6C = 0 (2.76)
2C = 5. (2.77)
5 −15 −3 45 5 35
Thus C = , B = , A= B−C = − = .
2 2 2 4 2 4
Therefore
35 15 5
yp = − x + x2 .
4 2 2
Example 112. Solve the differential equation y 00 − 10y 0 + 25y = 30x + 3 by
undetermined coefficients.
Solution: Step 1. Find the complementary function of y 00 − 10y 0 + 25y = 0.

Step 2. Find yp .
Step 1. The auxiliary equation is
m2 − 10m + 25 = 0.
(m − 5)2 = 0.
m1 = 5, m2 = 5.
Solution is of form (2.61), that is,
y = c1 e5x + c2 xe5x .
Step 2. Let yp = Ax + B
yp0 = A.
yp00 = 0.
0 − 10A + 25(Ax + B) = 30x + 3.
(−10A + 25B) + 25Ax = 30x + 3.
This implies
−10A + 25B = 3, 25A = 30.
6
Thus A =
5
6 3
−10. + 25B = 3 gives B = .
5 5
6 3
yp = x + .
5 5
The general solution is
6 3
y = c1 e5x + c2 xe5x + x + .
5 5
Example 113. Solve the differential equation y 00 + 4y = 3 sin 2x by undeter-
mined coefficients.
Solution: Step 1. Find the complementary function. Auxiliary equation is
m2 + 4 = 0
m = ±2i.
Solution is of form (2.63), that is,
yc (x) = e0x (c1 cos 2x + c2 sin 2x).

Step 2. Find a particular solution yp .
yp =Ax sin 2x + Bx cos 2x

yp0 =A sin 2x + 2Ax cos 2x + B cos 2x − 2Bs sin 2x
yp00 =2A cos 2x + 2Ax cos 2x − 4Ax sin 2x − 2Bs sin 2x
− 2B sin 2x − 4Bx cos 2x
=4A cos 2x − 4Ax sin 2x − 4B sin 2x − 4Bx cos 2x
yp00 + 4yp =(4A cos 2x − 4Ax sin 2x − 4B sin 2x − 4Bx cos 2x)
+ (4Ax sin 2x + 4Bx cos 2x) = 3 sin 2x
or 4A cos 2x−4B sin 2x = 3 sin 2x.
−3
−4b = 3 or B=
4
A = 0.
−3
yp = x cos 2x
4
3
y = yc + yp = c1 cos 2x + c2 sin 2x − x cos 2x.
4
Undetermined Coefficients− Annihilator Approach: Differential Equa-
tion (2.63) can be written in terms of operators D, D2 , D3 , . . . , Dn as
Ly = g(x) (2.78)
where L = an Dn + an−1 Dn−1 + · · · + a1 D + a0 . (2.79)

L is said to be an annihilator operator of a function f if
L(f (x)) = 0
where f (x) is sufficiently differentiable.

The differential operator Dn annihilates each of the functions
1, x, x2 , . . . , xn−1 . (2.80)
The differential operator (D −α)n , where α is a real constant, annihilates each

of the functions
eαx , xeαx , x2 eαx , . . . , xn−1 eαx . (2.81)
Example 114. Find a differential operator that annihilates the function
4e2x − 10xe2x .
Solution: n = 2, α = 2, (D − 2)2 is a differential operator which annihilates

4e2x − 10xe2x , that is,
(D − 2)2 (4e2x − 10xe2x = 0.
The differential operator [D2 −2αD+(α2 +β 2 )]n annihilates each of functions.
eαx cos βx, xeαx cos βx, x2 eαx cos βx, . . . , xn−1 eαx cos βx,
eαx cos βx, xeαx cos βx, x2 eαx cos βx, . . . , xn−1 eαx cos βx. (2.82)
Remark 29. (i) If L annihilates y1 and y2 then it also annihilates their linear
combination, that is, their linear combination, αy1 + βy2 , where α and β are
real numbers.
(ii) Let L1 and L2 be annihilator operator for y1 and y2 respectively. However
L1 (y2 ) 6= 0 and L2 (y1 ) 6= 0. Then L1 L2 annihilates αy1 + βy2 .
Steps for Solution:
(i) Find the complementary solution yc of L(y) = 0.
(ii) Operate on both sides of L(y) = g(x) with a differential operator L1
that annihilates the function g(x).
(iii) Find the general solution of the higher order homogeneous differential
equation L1 L( y) = 0.
(iv) Delete from the solution in step (iii) all those terms that are duplicate
in the complementary solution yc found in step (i). Form a linear com-
bination yp of the terms that remain. This is the form of a particular
solution of L(y) = g(x).
(v) Substitute yp found in step (iv) into L(y) = g(x). Match coefficients of
the various functions on each side of the equality, and solve the resulting
system of equations for the unknown coefficients in yp .
(vi) With the particular solution found in step (v), form the general solution
y = yc + yp of the given differential equation.
Example 115. Solve y 00 + 3y 0 + 2y = 4x2 using undetermined coefficients.
Solution: Step 1. Solve the homogeneous equation
y 00 + 3y 0 + 2y = 0.
The auxiliary equation is
m2 + 3m = 2 = 0.
Roots of this equation are m1 = −1 and m2 = −2, and so complementary
function is of form (2.60), that is,
yc = c1 e−x + c2 e−2x .
Step 2. Now, since 4x2 is annihilated by the differential operator D3 , we find

that
D3 (D2 + 3D + 2)y = 4D3 x2
is the same as
D3 (D2 + 3D + 2)y = 0. (2.83)
The auxiliary equation of the fifth order in (2.83),
m3 (m2 + 3m + 2) = 0
or m3 (m + 1)(m + 2) = 0,
has roots m1 = m2 = m3 = 0, m4 = −1. and m5 = −2.
Thus its general solution must be
y = c1 + c2 x + c3 x2 + c4 e−x + c5 e−2x . (2.84)
The terms in the box in (2.84) constitute the complementary function of the
given equation. We can very well argue that a particular solution yp of the
given equation should also satisfy (2.83). This means that the terms remaining
(2.84) must be the basic form, of yp :
yp = A + Bx + Cx2 (2.85)
where, c1 , c2 , c3 are replaced by A, B and C respectively. For (2.85) to be

a particular solution of the given equation, it is necessary to find specific
coefficients A, B and C.
Differentiating (2.85), we obtain
yp0 = B + 2Cx, yp00 = 2C.
Substituting these values into the given equation
y 00 + 3y 0 + 2y = 4x2 ,
we get
yp00 + 3yp0 + 2yp = 2C + 2B + 6Cx + 2A + 2B + 2Bx + 2Cx2 = 4x2
or (2C + 2B + 2A) + (6C + 2B)x + 2Cx2 = (constant terms)0 + 0x + 4x2 .

Comparing constant terms, coefficients of x and x2 , we get
2C + 3B + 2a = 0, 6C + 2B = 0, and 2C = 4.
This implies C = 2, B = −6, and A = 7. Thus yp = 7 − 6x + 2x2 .

Step 3. The general solution of the given equation is
y = yc + yp or y = c1 e−x + c2 e−2x + 7 − 6x + 2x2 .


y 00 − 9y = 54
by undetermined coefficient approach.
Solution: Applying D to the differential equation, we obtain D(D2 −9)y = 0.
m(m2 − 9) = 0.
Roots are m1 = 0, m2 = 3, m3 = −3.
The general solution is yc = c1 e3x + c2 e−3x + c3 and a particular solution is
yp = A.
Putting values of yp , , yp0 , yp00 in the given differential equation we get
0 − 9A = 54 or A = −6.
Thus, the general solution is
y = c1 e3x + c2 e−3x − 6.
Example 117. Solve y 00 − 2y 0 + 5y = ey sin x using the undetermined
coefficients−annihilator approach.
Solution: Applying D2 − 2D + 2 to the differential equation we obtain
(D2 − 2D + 2)(D2 − 2D + 5)y = 0.
Then the general solution is
y = ex (c1 cos 2x + c2 sin 2x) + ex (c3 cos 2x + c4 sin 2x),
and
yp = Aex cos x + Bex sin x
yp0 = Aex cos x − Aex sin x + Bex sin x + Bex cos x
yp00 = (Aex cos x − Aex sin x) − (Aex sin x + Aex cos x)
+ (Bex sin x + Bex cos x) + (Bex cos x − Bex sin x).
Substituting yp , yp0 , yp00 in the given equation we get:
(Aex cos x − Aex sin x) − (Aex sin x + Aex cos x) + (Bex sin x + Bex cos x)
+(Bex cos x − Bex sin x) − 2Aex cos x − 2Aex sin x + 2Bex sin x
+2Bex cos x+5(Aex cos x+Bex sin x) = ex sin x+0.ex cos x+0.constant term
or 3Aex cos x + 3Bex sin x = ex sin x.
1
Equating coefficients gives A = 0 and B = .
3
1
y = ex (c1 cos 2x + c2 sin 2x) + ex sin x.
3
2.5.7 Method of Variation of Parameters

The method of variation of parameters described below is applied to
solve a linear second order non-homogeneous differential equation of the form
a2 (x)y 00 + a1 (x)y 0 + a0 (x)y = g(x) which can be written in the standard form
y 00 + P (x)y 0 + Q(x)y = f (x). (2.86)
In (2.86) we assume that P (x), Q(x) and f (x) are continuous on some interval
I. As we have seen earlier in Sections 2.5.3 through 2.5.5 there is no difficulty
in finding the complementary function yc of (2.86) when P (x) and Q(x) are
constant functions.
Step 1. Find complementary function yc of (2.86) of the form yc = c1 y1 +c2 y2 .
Step 2. Find Wronksian W of y1 and y2 , that is,

y1 y2
W (y1 , y2 ) = 0
y1 y20

0 y2 y1 0
Step 3. Write W1 = , W2 = 0
f (x) y20

y1 f (x)
and find u1 and u2 by integrating
W1 W2
u01 = , u02 = .
W W
Step 4. Find a particular solution which is of form yp = u1 y1 + u2 y2 .
Step 5. The general solution of the equation is y = yc + yp .
Example 118. Solve the differential equation y 00 − y = xex by applying the
method of variation of parameters.
Solution: Corresponding homogeneous equation is y 00 − y = 0.
The auxiliary equation is m2 − 1 = 0.
Roots are m1 = 1, m2 = −1.
The complementary function is
y = c1 ex + c2 e−x
e−x
x
e
W = x = −e0 − e−0 = −2
−e−x

e
e−x

0
W1 = = −x
xex −e−x
x
e 0
W2 = x = xe2x
e xex
−x x
u01 = =
−2 2
xe2x −1 2x
u02 = = xe .
−2 2
Integrating u01 and u02 we get
x2
u1 =
4
u2 = −(xe2x /4) + (e2x /8).
y = yc + yp
where yc = c1 ex + c2 e−x
x4 x
yp = u1 y1 + u2 y2 = e − (xe2x /4)e−x + (e2x /8)e−x .
4
1 1 1
Thus y = c1 ex + c2 e−x + x4 ex − xex + ex .
4 4 8
Example 119. Apply the method of variation of parameters to solve the
y 00 − y = cosh x.
Solution: The auxiliary equation is
m2 − 1 = 0, so m1 = 1 and m2 = −1
and complementary function
yc = c1 ex + c2 e−x = c1 y1 + c2 y2
e−x
x
e
W (y1 , y2 ) = x = −2
−e−x

e
1
f (x) = cosh x = (ex + e−x )
2
e−x

0
= −e−x cosh x

W1 =
cosh x −e−x

x
e 0 = ex cosh x

W2 = x
e cosh x
W1 1 1
u01 = = cosh xe−x = (e−2x + 1)
W 2 4
x
W2 cosh xe −1
u02 = = = (1 + e2x )
W −2 4
−1 −2x 1
u1 = e + x
8 4
−1 1 2x
u2 = x− e
4 8

1 1 −1 1
yp = − e−2x + x ex + x − e2x e−x .
8 4 4 8

y = yc + yp .
2.5.8 Cauchy-Euler Equation

Second order equations of the form
d2 y dy
ax2 + bx + cy = g(x) (2.87)
dx2 dx
where a, b and c are constants, a 6= 0, and g(x) is continuous on a given interval
are called Cauchy-Euler equations. By putting y = xm , y 0 = mxm−1 , y 00 =
m(m − 1)xm−2 in (2.87) we get
d2 y dy
ax2 + bx + cy = am(m − 1)xm + bmxm + cxm
dx2 dx
= (am(m − 1) + bm + c)xm .
Thus y = xm is a solution of
d2 y dy
ax2 + bx + cy = 0 (2.88)
dx2 dx
where m is solution of the auxiliary equation :
am(m − 1) + bm + c = 0 or am2 + (b − a)m + c = 0. (2.89)
There are three different cases to be considered:

Case 1: Distinct Real Roots
Let m1 and m2 be real roots of (2.89) such that m1 6= m2 such that m1 6= m2 .
Then y1 = xm1 and y2 = xm2 form a fundamental set of solutions. Hence the
general solution is
y = c1 xm1 + c2 xm2 . (2.90)
Case 2: Repeated Roots
If the roots of (2.89) are repeated, that is, m1 = m2 then the general solution
is of the form
y = c1 x m m
1 + c2 x1 ln x. (2.91)
Case 3: Conjugate Complex Roots If the roots of (2.89) are conjugate
pair m1 = α + iβ, m2 = α − iβ, where α and β > 0 are real, then a solution
is
y = c1 eα+iβ + c2 eα+iβ .
This solution can be written in the real form as
y = xα [c1 cos(β ln x) + c2 sin(β ln x)]. (2.92)

Verification: xeiβ = (elnx )iβ = eiβlnx which, by Euler’s formula, is the same
as
xiβ = cos(β ln x) + i sin(β ln x).
Similarly,
x−iβ = cos(β ln x) − i sin(β ln x).
Adding and substracting the last two results yields
xiβ + x−iβ = 2 cos(β ln x) and

iβ −iβ
x −x = 2i sin(β ln x), respectively.
Since y = c1 xα+iβ + c2 xα−iβ is a solution for any values of the constants, we

see, in turn, for c1 = c2 = 1 and c1 = 1, c2 = −1 that
y1 = xα (xiβ + x−iβ )
and y2 = xα (xiβ − x−iβ )

or y1 = 2xα cos(β ln x)
and y2 = 2ixα sin(β ln x)
are also solutions.
Since the Wronskian for xα cos(β ln x) and xα sin(β ln x) is βx2α−1 6= 0, β > 0,
on the interval (0, ∞), we conclude that
y1 = xα cos(β ln x) and y2 = xα sin(β ln x)
constitute a fundamental set of real solutions of the differential equation.

Hence we get the general solution in the real form
y = xα [c1 cos(β ln x) + c2 sin(β ln x)].
Remark 30. The method described above holds true for similar equations of
order n.
Example 120. Solve the differential equations
(a) x2 y 00 − 2y = 0.
(b) x2 y 00 − 3xy 0 − 2y = 0.
(c) x2 y 00 + xy 0 + y = 0 subject to initial conditions y(1) = 1, y 0 (1) = 2.
Solution (a) The auxiliary equation is
m2 − m − 2 = 0 or (m + 1)(m − 2) = 0
so m1 = −1, m2 = 2.
y = c1 x−1 + c2 x2 .
(b) The auxiliary equation is
m2 − 4m − 2 = 0
√
4± 16 + 8
m =
2
1√
= 2± 24
2
√
= 2 ± 6.

√ √
y = c1 x2+ 6
+ c2 x2−6
.
(c) The auxiliary equation is m2 + 1 = 0 so that the general solution is given

by
y = c1 cos(ln x) + c2 sin(ln x) and
1 1
y 0 = −c1 sin(ln x) + c2 cos(ln x).
x x
The initial conditions imply c1 = 1 and c2 = 2.
Thus, y = cos(ln x) + 2 sin(ln x).
2.6 Solution of Engineering Problems Modeled by Dif-

ferential Equations
2.6.1 Population Dynamics
We have seen in Section 2.2.1 that the Equation (2.8) represents population
growth or decay. In fact, it also represents many other situations such as
radioactive decay and compound interest. The differential Equation (2.8),
dP
= λP (t)
dt
with P (t) > 0 and λ a constant (either negative or positive) is solved by the
separable variable method
Z Z
1
dP = λdt,
P (t)
or ln P (t) = λt + c ,
where c is a constant of integration. This implies that
eln P (t) = eλt+c = eλt ec = Aeλt ,
where A = ec is a new constant. It is also clear that A = P (0) = P0 =

population at the time t = 0 (initial population). Thus the particular solution
of the population growth model (2.8) is
P (t) = P (0)eλt . (2.93)
Due to the presence of of the natural exponential function in the solution the
population model (2.8)
dP
= λP
dt
is often called the exponential or natural growth model. Figure 2.12 shows
a typical growth of P (t) in the case of λ > 0 and the case λ < 0 is given in
Figure 2.13.
P=Poe'1, 1o.70
Po
FIGURE 2.12: Natural Growth
It is clear that if population is given at any time t then initial population

can be computed.
Example 121. The population of a city is known to increase at a rate pro-
portional to the number of people present at a time t. If the population is
doubled in 7 years, how long it will take to triple?
Solution: Let P (t) denote the population at time t and let P (0) denote the
population at t = 0 (initial population). Then
dP (t)
= λP (t) [see (2.8)].
dt
Its solution is
P (t) = P (0)eλt
Po
FIGURE 2.13: Natural Decay
as we saw earlier.
P (7) = 2P (0) = P (0)eλ−7
or
2 = eλ7 or ln 2 = 7λ
ln 2
or λ =
7
ln 2
t ln 2
3 = e 7 or ln 3 = t
7
or t = 7 ln 3 − ln 2 = ln 37 − ln 2
ln 3
= 7 ln 3 − ln 2 = 7 .
ln 2
Example 122. According to an authentic source the total population of the
world in 1999 was 6 billion persons. Assume that since then increase of about
dP (t)
212,000 persons per day according to Equation (2.8) : = λP (t).
dt
(a) What is the annual growth rate λ?
(b) Find the world population at the middle of the 21st century.
(c) In how many years the will world population be 60 billion (grow by 10
times).
Solution: (a) P0 = P (0) = 6 billion, t = 0 year corresponds to (mid) 1999.
P (t) is increasing by 212,000 or 0.000212 billion, persons per day at time t = 0
means that P 0 (0) = 0.000212(365.25) ≈ 0.07743 billions per year. From the
natural growth model
dP
= λP
dt
with t = 0 , we get
P 0 (0) 0.07743
λ= ≈ ≈ 0.0129.
P (0) 6
Thus the world population grew at the rate of about 1.29% annually in 1999.
This value of λ gives population function
P (t) = 6e0.0129t .
(b) Here t = 51 and so
P (51) = 6e(0.0129)×51 ≈ 11.58(billion),
that is, the world population in 2050 is 11.58 billion. Thus the world popula-
tion at the given growth has doubled in 5 years.
(c) By given information P (t) = 60 , λ = 0.0129 then by (2.93), 60 ≈ 6e0.0129t
or
ln 10
t= ≈ 178years.
0.0129
Thus, in 178 years or in 2177 the world population will be 10 times the pop-
ulation in mid 1999.
Example 123. Suppose the population of a city grows at a rate proportional
to the population at time t. The initial population of 500 increases by 15% in
10 years. What will be the population in 30 years? How fast is the population
growing at t = 30?
Solution: Let P = P (t) be the solution at time t. dPdt(t) = λP (t) and P (t) =
Aeλt .
Since P (0) = 500 , we have A = 500 and P = 500eλt .
Since 15% of 500 is 75, we get P (0) = 500e10λ = 575.
1 575 1
Solving for λ we get λ = ln = ln 1.15.
10 500 10
1
For t = 30, P (30) = 500e 10 ln 1.5∗30 = 500e3 ln 1.15 = 760years.

0 dP 1
P (30) = = λP (30) = (ln 1.15)760 = 10.62 persons per year.
dt t=30 10
Example 124. The population of an insect grows in a pond at a rate propor-
tional to the number of insects present at time t. After 3 hours it is observed
that 400 insects are present. After 10 hours 2000 insects are present. What
was the initial number of insects?
Solution: Let P = P (t) be insect population at time t and P0 be the initial
population of the insect.
dP
We have P = P0 eλt by solving = λP (t).
dt
400 1
P (3) = 400, P (10) = 2000 give us 400 = P0 e3λ or eλ = ( )3 .
P0
From P (10) = 2000, we get
400 10
2000 = P0 e10λ = P0 ( )3
P0
2000 −7 2000 −3
so 10 = (P0 ) 3 and P0 = ( 10 ) 7 ≈ 201.
400 3 400 3
As discussed in Section 2.2.1 population growth is also modelled by Equation
(2.9):
dP (t)
= aP (t) − bP 2 (t).
dt
We solve it by separation of variables. Decomposing the left hand side of
dP
= dt into partial fractions and integrating yields
P (a − bP )

11 b
+ dP = dt
aP a(a − bP )
1 1
ln |P | − ln |a − bP | = t + c
a a
P
ln = at + ac
a − bP
P
= c1 eat
a − bP
P (t) = ac1 eat − bc1 P (t)eat
P (t)(1 + bc1 eat ) = ac1 eat
ac1 eat ac1
P (t) = at
= −at
.
1 + bc1 e e + bc1
a P0
Let P (0) = P0 , P0 6= , we find c1 = .
b (a − bP0 )
After substitution and simplification
aP0
P (t) = ·
bP0 + (a − bP0 )e−at
Example 125. A model for the population P (t) in a suburb of a large city
is given by the initial value problem
dP
= P (10−1 − 10−7 P ), P (0) = 500,
dt
where t is measured in months. What is the limiting value of the population?
At what time will the population be equal to one-half of the limiting value?
dP 1 1
Solution: From = P ( − 7 P ) and P (0) = 5000 we obtain, by P (t)
dt 10 10
1 1
after solution of Example 124, a = ,b= 7
10 10
500
P =
(0.0005 + 0.0995e−.1t )
so that
P −→ 1000, 000 as t −→ ∞.
If P (t) = 500, 000 then t = 52.9 months.
2.6.2 Radioactive Decay

Example 126. It is known that the radioactive isotope of lead, Pb−209,
decays at a rate proportional to the amount present at time t and has a half
life of 3.3 hours. If 1 gram of this isotope is present initially, how much time
will it take for 90% of the lead to decay?
Solution: Let A = A(t) be the amount of lead present at time t. By the
discussion in Equation (2.10),
dA
= λA and A(0) = 1.
dt
By solving this initial value problem we get
A = eλt at t = 3.3 hrs,

1 1 1
A(3.3) = e(3.3)λ = implying λ = ln .
2 3.3 2
If 90% lead is decayed then 0.1 gram remains.
1 1
Let A(t) = 0.1 or 0.1 = eλt = et( 3.3 ) ln 2
3.3 ln 0.1
so t = ≈ 10.96 hrs.
ln 12
Example 127. Assume 100 milligrams of a radioactive substance present at

time t = 0 decreases 3% in 6 hours. If the rate of decay is proportional to the
amount of the substance present at time t,
(a) Find the amount remaining after 24 hours.
(b) Determine the half life of the radioactive substance.
Solution: (a) Let A = A(t) be the amount present at time t. From Equation
dA
(2.10) we get = λA, where A(0) =100 grams. Solving this initial value
dt
problem we get A = 100eλt . Since the amount of the radioactive substance is
1
97grams after 6 hours, 97 = A(6) = e6λ or 97 = 100e6λ or λ = ln 0.97.
6
1 1 1
(b) When A(t) = 50 so 50 = 100eλt or λt = ln or t = ln or
2 λ 2
1 1
t= 1 ln ≈ 136.5 hours.
6 ln 0.97 2
2.6.3 Carbon Dating

The examples are related to finding the ages of artifacts, namely carbon
dating of an artifact.
Example 128. Suppose a fossilized wood has 47% C−14. Assuming that
wood died at the initial time (t = 0), compute the time T it would take for 1
gram of the radioactive carbon to decay this amount.
Solution: By Equation (2.10)
dA
= λA
dt
where λ is the half life C−14
A(t) = A0 eλt ,
A(t) = represents mass of C−14 at time t and A(0) = A0 denotes the mass
at the initial time. At time T
A(T ) = A0 eλT
or
A(T )
= eλT
A(0)
or
A(T )
ln = λT
A0
or
1 A(T )
λ= ln
T A0
or
A(T )
1
A(t) = A0 e T ln .
A0
Half life of C−14 is 5600 years approximately. Hence
ln 2
T = ln(0.47).
5600
Example 129. A fossilized bone is found to contain one thousandth the
origin amount C − 14. Determine the age of the fossil.
dA
Solution: = λA(t) , where λ is a constant of proportionality of decay (let
dt
λ = −k). The solution is A(t) = A(0)e+λt by separation of variables. Half life
of C−14 is approximately 5600 years
A0
A(56000) =
2
1
A0 = A0 e5600λ
2
1
or ln = 5600λ
2
1
or λ = − ln 2 = −0.00012378
5600
1
A0 = A0 e−0.00012378t
1000
ln 1000
t = = 55800 years.
0.00012378
2.6.4 Newton’s Law of Cooling

Example 130. A thermometer is removed from a room where temperature is
80o F and is taken outside where the air temperature is 10o F. After 2 minutes
the thermometer shows the reading 40o F. What is reading of the thermometer
at t = 3 minutes? How long will it take for the thermometer to reach 20o F?
Solution: This phenomenon is modelled by
dT
= λ(T − 10),
dt
see Section 2.2.4. Its solution is T = 10 + ceλt .
If T (0) = 80 and T (2) = 40 then 80 = 10 + ce0 , 40 = 10 + ce2λ or c = 70
30 1 3
and 30 = 70e2λ or 2λ = ln or λ = ln .
70 2 7
1 3
T (3) = 10 + 70e3∗ 2 ln 7 .
1 3
If T (t) = 20 = 10 + 70e 2 ln 7 t
1 3
10 = 70e 2 ln 7 t

1 3 10
or ln t = ln
2 7 70
1 1
t = 1 3 ln .
2 ln 7
7
Example 131. A 4 kg roast, initially at 60o F is placed in a 375o F oven at 6
p.m. At 7.15 p.m. the temperature T (t) of the roast is 125o F. Find the time
when the roast will be at 150o F.
Solution: We take t = 0 which corresponds to 6 p.m. We assume that at
any instant the temperature T (t) of the roast is uniform throughout. We have
T (t) < A = 375, T (0) = 60, and T (at 7.15 p.m.) = T (75) = 125. Hence the
Equation (2.11) takes the form
dT
= λ(375 − T )
Z dt Z
dT
= λ dt
375 − T
− ln(375 − T ) = λt + c
375 − T = Be−λt .
Now T (0) = 60 implies 375 − 60 = B or B = 315. Thus T (t) = 375 − 315e−λt .

We know that T = 125 when t = 75. Substituting in the preceding equation
we get
T (75) = 125 = 375 − e−75λ

315e−75λ = 250
315
= e75λ
250
1 315
or λ = ln .
75 250
T (t) = 375 − 315e−λt
150 = 375 − 315e−λt
315 225eλt
=
1 315
or t = ln .
λ 225
Application of Newton’s cooling law for determining time of death

A police officer discovers the body of a person presumably murdered. Using
Newton’s law we determine the time of death. The body is placed in a room
that is kept at constant temperature 70o F. For some time after the death, the
body will radiate the heat into the cooler room, causing the body’s tempera-
ture to decrease assuming that the body temperature was normal, 98.6o F at
the time of death. From the body’s current temperature the time of death can
be computed.
According to Newton’s cooling law the body will radiate heat energy into
the room at a rate proportional to the difference in temperature between the
body and the room. If T (t) is the body temperature at time t, then for some
constant of proportionality λ,
dT
= λ(T (t) − 70)
Z dt
dT
= λdt
T − 70
ln |T − 70| = λt + A, where A is consatnt
|T − 70| = eλt+A = Beλt , where B = eA .
Then
T − 70 = Beλt
T (t) = 70 + Beλt .
Suppose the officer arrived at 8.30 p.m. and the body temperature was 94.4o F;
8.30 p.m. is considered as t = 0 then
T (0) = 94.4 = 70 + Beλ0 = 70 + B

or B = 24.4,
giving T(t) = 70 + 24.4eλt .
Suppose the officer makes another measurement of the body’s temperature
at 10 p.m. and recorded temperature is 890 F. Taking t = 90 minutes (the
temperature at 10 p.m.) and T (90) = 89,
90λ
89 = 70 + 24.4ee
1 19
implying λ = ln Thus,
90 24.4
1 19
T (t) = 70 + 24.4e 90 ln 24.4 t
t 19
98.6 = 70 + 24.4e 90 ln 24.4
28.6 t 19
= e 90 ln 24.4
24.4
28.6 t 19
ln = ln
24.4 90 24.4
1 28.6
t = 90 19 ln 24.4
ln 24.4
= approximately − 57.07 min.
Therefore the time of death is 57.07 minutes before 8.30 p.m.
2.6.5 Spread of Diseases, Technologies and Rumor

Example 132. Suppose a student in a school of 1000 students is carrying a
flue virus. Find a differential equation governing the number of people N (t)
who have contracted the flu if the rate at which the the disease spreads is
proportional to the number of interactions between the number of students
with flu and the number of students, who have not yet been exposed to it.
Solution: The number of students with the flu is N and the number not
infected is 1000 − N , so
dN
= KN (1000 − N ).
dt
Example 133. A technological innovation is introduced into a community
with a fixed population of n people, say at time t = 0. Find a differential
equation representing the number of people N (t) who have adopted the in-
novation at time t and the number who have not adopted the innovation at
time time t under the condition that the rate at which the innovation spreads
through the community is jointly proportional to the number of people who
have not adopted it.
Solution: The rate at which the technology innovation is adopted is propor-

tional to the number of people N who have adopted it and the number of
people M who have not adopted it. Thus M + N = n + 1 and so
dN
= λN (n + 1 − N ), N ((0) = 1.
dt
Example 134. (a) Describe a model for the spread of a rumor and solve it.
(b) Currently rumor is known to 300 students in a residential university of
45000 students. It will be known to 900 students after one week. Assuming
logistic growth find the number of students who will know the rumor after 4
weeks.
Solution: (a) As seen in Section 2.2.5 the desired model is
dN λ
= N (M − N ),
dt M
where N (t) is number of students who know the rumor at time t, and M is
the total number of students (population size). This equation can be written
as
dN
= µN (M − N )
dt
λ
where is constant
M
dN
= µdt
N (M − N )
Z Z
dN
= µ dt + c
N (M − N )
1 N
ln | | = µt + c
M M −N
N
so ln | | = M µt + M c
M −N
N N
ln = ln | |
M −N M −N
as N > 0 and M − N > 0.
In exponential form
N
= eM µt+M c = AeM µt , A = eM c
M −N
N = (M − N )AeM µt
N (1 + AeM µt ) = M AeM µt
M AeM µt
N =
AeM µt + 1
M M M M
N= = = = .
1 1+ 1 −µM t 1 + be−µM t 1 + be−ct
1+ Ae
AeM µt
(b)
M = 45000.
For
45000
t = 0, N = 300 ⇒ 300 = ⇒ b = 149
1+b
t = 4, N = 900.
45000
Thus N = .
1 + 149e−ct
45000
For t = 1, N = 900, 900 =
1 + 149e−c
45000 49
1 + 49e−c = = 50 or e−c = , so
900 149
45000
N= 49 .
1 + 149( 149 )t
45000
For t = 4, N = ≈ 16000 students who know the rumor after 4
49 2
1 + 149( )
149
weeks.
2.6.6 Series Circuit

Example 135. A 12 volt battery is connected to a series circuit in which the
1
inductance is henry and the resistance 15 ohms. Find the current I if the
2
initial current is zero.
Solution: By the model of Section 2.2.6, we have
1 dI
+ 12i = 12,
2 dt
I(0) = 0.
It is an initial value problem having solution
I(t) = 1 + Ae−24t .
Now I(0) = 0 = 1 + A =⇒ A = −1. Therefore
I(t) = 1 − e−24t .
2.6.7 Draining Tank

Example 136. A tank in the form of a right circular cylinder is leaking water
through a circular hole in its bottom. Find the height h of water in the tank
at any time t if the initial height of the water is 5 ft.
Solution: As discussed in Section 2.6.7, h(t) is the solution of the differential
equation
dh Bp
=− 2gh (see Equation (2.19))
dt A
where A is the cross-sectional area of the cylinder and B is the cross-sectional
area of the hole at the base of the container. This equation can be written as
dh Bp
√ = − 2gdt
h A
dh B
√ = Cdt, where C = −
h A
Z Z
−1
or h 2 dt = C dt.
1
2h 2 = Ct + C1 .
1 √ ct + 2√5 2
For t = 0, h = 52 implying C1 = 2 5. Thus, h(t) = .
2
2.6.8 Spring and Mass

Example 137. A 16 lb weight stretches a spring 4 ft. Assuming that a damp-
ing force numerically equal to two times the instantaneous velocity acts on
the system, find the equation of motion if the weight is released from the
equilibrium position with an upward velocity of 6 ft/s.
Solution: By Hooke’s law 16 = 4k or k =4lb/ft and
16 1
w = mg gives m = = .
32 2
The differential equation of motion is
1 d2 x dx
2
= −4x − 2
2 dt dt
d2 x dx
or +4 + 8x = 0, (see Section 2.2.8)
dt2 dt
m2 + 4m + 8 = 0
√
−4 ± 16 − 32
m=
2
m1 = −2 + 2i, m2 = −2 − 2i
and α = 2, β = 2, so solution is
x(t) = e2t (c1 cos 2t + c2 sin 2t).
Example 138. A 16 lb weight is attached to a spring whose spring constant

is 8 lb/ft. What is the period of simple harmonic motion?
d2 x
Solution: + w2 x = 0
dt2
16 1 8
k = 8, m = = , w2 = = 16.
32 2 1
2
The differential equation of motion is
1 d2 x
+ 8x = 0
2 dt2
d2 x
+ 16x = 0.
dt
m2 + 16 = 0.
m = ±4i.
Therefore,
x = c1 cos 4kt + c2 sin 4kt.
2π 2π π
Period of oscillation = = = .
w 4 2
1 4 2
Frequency = = = .
T 2π π
See Section 2.2.8.
2.6.9 Mixture of Salt and Payment of Loan

Example 139. A tank contains 200 liters of fluid in which 30 grams of salt is
dissolved. Pure water is pumped into the tank at a rate of 4 liter/min and the
well mixed solution is pumped out at the same rate. Find the number A(t) of
grams of salt in the tank at time t.
Solution: From Equation (2.23), we have
dA A
=0−
dt 50
−t
implying A = ce 50
A(0) = 30 − ce0 then c = 30
−t
Thus A(t) = 30e 50 .
Example 140. A tank is filled to capacity with 500 gallons of water. Brine
containing 2 pounds of salt per gallon is pumped into the tank at a rate 5
gal/min. The well mixed solution is pumped out at a faster rate of 10 gal/min.
(a) Find the number A(t) of pounds of salt in the tank at the time t. (b) when
is the tank empty?
Solution: From Equation (2.23) and by given data in this problem we have
dA 10A 2A
= 10 − = 10 −
dt 500 − (10 − 5)t 100 − t
dA 2A
or + = 10.
dt 100 − t
By integrating we have
A(t) = 1000 − 10t + c(100 − t)2 .

1
If A(0) = 0 then c = − . The tank is empty in 100 minutes.
10
Example 141. Assume that a lake marked as A has a volume of 480 km3
and that its rate of inflow from lake marked as B in 350 km3 and outflow
to lake marked as C is the same. Let us assume further that at the time
t = 0 (years), the pollutant concentration of lake A caused by past industrial
pollution that ceased to exist now is five times that of lake B. If the outflow
henceforth is perfectly mixed lake water, how long will it take to reduce the
pollution concentration in lake A to twice that of lake B?
Solution: It is modelled by Equation (2.23), where n = 350 km3 , m = 480

km3 , p = c (pollutant concentration of lake B), A0 = A(0) = 5cm, that is,
dA 350
= 350c − A
dt 480
dA 35
or + A = 350c
dt 48
dA
+ qA = k
dt
35
where q = , k = 350c and integrating factor I = eqt . By the method
48
discussed in Section 2.4.2, we find that
Z t
A(t) = e−qt [Ao + keqt dt]
0
−qt k
= e [Ao + (eqt − 1)]
q
−nt nc nt
= e m [5cm + n (e m − 1)]
m
−nt
A(t) = cm + 4cme m .
Now, we find t such that A(t) = 2 cm, that is,
−nt
cm + 4cme m = 2 cm
nt
or 4 = em
nt m ln 4 480
m = ln 4 or t = n = 350 ln4 ' 1.901 (years)
Example 142. Suppose a person needs loan of $ 20.000 from a bank. The
bank is willing to provide loan with an annual interest of 8%. What monthly
payment is required to the bank if the borrower wants to pay in four years?
Solution: Let c(t) be the balance due on the loan at any time t. Suppose that
dA
A is measured in dollars and t in years. Then has the unit of dollars per
dt
year. The balance on the loan is affected by two factors namely: accumula-
tion of interest tends to increase A(t) and the payments tend to reduce it. By
Equation (2.23) we get
dA
= γA − 12µ
dt
where γ is the annual interest rate, µ is the monthly payment rate and µ is
multiplicative by 12 so that all the terms will have units of dollars per year.
The initial condition is A(0) = A0 , where A0 is the amount of the loan.
dA
Here γ = 0.08, A0 = 20, 000, thus − 0.08A = −12µ. The integrating factor
dt
−0.08t
is e and by Section 2.4.2, A = 150µ + ce0.08t . Since A(0) = 20, 000, c =
20, 000 − 150µ we have
A(t) = 20, 000e0.08t − 150µ(e0.08t − 1).

In order to find monthly payment to pay off the loan in four years we set
A(t) = 0 for t = 4 in the previous equation to get
20, 000 e0.32
µ= = $486.88.
150 e0.32 − 1
2.7 Laplace Transform for Linear Differential Equations

2.7.1 Introduction to Laplace Transform
Pierre Simon de Laplace, a French Mathematician (1749-1827) was essen-
tially interested in describing nature using mathematics. The main goal of this
section is to present those results of Laplace which are used to find solutions
of differential and integral equations.
Definition and Fundamental Properties of Laplace Transform
The Laplace transform is considered an extension of the idea of the indefinite
integral transform : Z x
I = {f (t)} = f (t)dt.
0
It is defined as follows:
Definition 47. The Laplace transform of f (t), denoted by L{f (t)} is de-
fined by Z ∞
F (s) = L{f (t)} = e−st f (t)dt, (2.94)
0
where s is a real number called a parameter of the transform.
Remark 31. (a) Laplace transform takes a function f (t) into a function F (s)
of the parameter s.
(b) We present functions of t by lower case letters, f, g and h, and their
respective Laplace transforms by corresponding capital letters F, G and H.
Thus we write
L{f (t)} = F (s)
Z ∞
or F (s) = e−st f (t)dt.
0
(c) The defining equation for the Laplace transform is an improper integral,
which is defined as
Z ∞ Z T
−st
e f (t)dt = lim e−st f (t)dt.
0 T →∞ 0
Thus, the existence of the Laplace transform of f depends upon the existence
of the limit.
(d) A Laplace transform is rarely computed by referring directly to the def-
inition and integrating. In practice we use tables of Laplace transforms of
commonly used functions; see for example Table 2.2 later in this section.
In Section 2.7.2 we will develop methods that are used to find the Laplace
transforms of a shifted or translated function, step functions, pulses and var-
ious other functions that often arise in applications.
(e) We shall verify that the Laplace transform is linear, that is,
L(f + g) = L(f ) + L(g)
L(λf ) = λL(f ) = λF.

Example 143. Show that
1
(i) L{f (t)} = where, s > 0, f (t) = 1.
s
1
(ii) L{f (t)} = , where s > 0 and f (t) = t.
s2
n!
(iii) L{f (t)} = , s > 0, where f (t) = tn .
sn+1
1
(iv) L{(f (t)} = , where f (t) = sin t.
s2
+1
s
(v) L{(f (t)} = 2 , where f (t) = cos 2t.
s +4
1
(vi) L{eat } = , s > a, and a is any real number.
s−a
e
(vii) L{e3t+1 } = , s > 3.
s−3
Solution:(i) By Definition 47, we have
Z ∞ Z T
−st
L{1} = e (1)dt = lim e−st dt
0 T →∞ 0
T
−st −1
= lim e
T →∞ s 0
1 −sT
+ e−s.0

= lim −e
T →∞ s
1
− lim e−st + 1

=
s
1 1
= [0 + 1] = , s > 0.
s s
(ii)
Z ∞
L{t} = e−st tdt (by Definition 47)
0
Z ∞
= lim e−st tdt
T →∞ 0
T
1 ∞ −st

−1 −st
Z
= lim e t + e dt
T →∞ s 0 s 0
by applying integration by parts. Since the first term is zero and the second
is
1
L{t} = ,
s2
by part (i).
1
Therefore L{t} = .
s2
(iii) By Definition 47, we have
Z ∞
n
L(t ) = e−st tn dt.
0
By applying the formula for integration by parts n times we conclude that

n!
L(tn ) = ,
sn+1
Z ∞ n −st ∞
n ∞ −st n−1
Z
t e
e−st tn dt = − + e t dt.
0 s 0 s 0
The first term on the right side is equal to zero for n > 0 and s > 0, so
Z ∞
n ∞ −st n−1
Z
n
L(tn ) = e−st tn dt = e t dt = L(tn−1 ).
0 s 0 s
Replacing n with n − 1 in this equation, we get
(n − 1)
L(tn−1 ) = L(tn−2 ).
s
Combining values of L(tn ) and L(tn−1 ) one can write
n(n − 1)
L(tn ) = L(tn−2 ).
s2
Continuing in this way one gets
n(n − 1)(n − 2) . . . 3.2.1
L(tn ) = L(t0 ).
sn
1
Since L(t0 ) = L(1) = by part (i), we obtain
s
n!
L(tn ) = ,
sn+1
where n! = n(n − 1)(n − 2) . . . 3.2.1.
R∞
(iv) L(sin t) = 0 e−st sin tdt, by Definition 47,
Z T
or L(sin t) = lim e−st sin tdt.
T →∞ 0
Z T
Let I = lim e−st sin tdt
T →∞ 0
∞
1 T −st
Z
1
= − e−st sin t + e cos tdt
s 0 s 0
∞ Z T
−1 −sT 1 −st 1
= e sin T + − 2 e cos t − − 2 (− sin t)e−st dt
s s 0 0 s
Z T
−1 −sT 1 1 1
= e sin T − 2 e−sT cos T + 2 − 2 e−st sin tdt
s s s s 0
−1 −sT 1 1 1
= e sin T − 2 e−sT cos T + 2 − 2 I.
s s s s
1
Bringing − I on the left hand side we get
s2

1 −1 −sT 1 1
1+ 2 I = e sin T − 2 e−sT cos T + 2 .
s s s s
By taking the limit as T → ∞ in this equation, we get

1 1
1+ 2 I = 2
s s
s2 1 1
or I = 2 2
= 2 .
s +1s s +1
(v) By Definition 47,

Z ∞
L{cos 2t} = cos 2te−st dt
0−st ∞
= e (−s cos 2t + 2 sin 2t) 0
e−sm (−s cos 2m + 2 sin 2m) s
= lim + 2
m→∞ s2 + 4 s +4
s
= 0+ 2
s +4
s
= .
s2 + 4
(vi) By Definition 47,
Z ∞
at
L{e } = e−st tat dt,
0
Z T
= lim e(a−s)t dt
T →∞ 0
T
1 (a−s)t
= lim e
T →∞ a−s 0

1 1
= lim −
T →∞ a − s a−s
1 1
= − = , provided a − d < 0 or s > a.
a−s s−a
Thus the Laplace transform of eat is

1
F (s) = L(eat ) = if s > a.
s−a
(vii) By Definition 47,
Z ∞
L{e 3t+1
} = e3t+1 e−st dt
0
Z ∞
= e e(3−s)t dt
h0 i∞
= e e(3−s)
0
e(3−s)m

1
= e lim −
m→∞ 3 − s 3−s
e
= , s > 3.
s−3
It may be observed that for s ≤ 0, L(1) does not exist.
If s < 0, then the exponent of e is positive for t > 0. Therefore
T
e−st

lim e−st dt = lim
T →∞ T →∞ −s 0

−1 −sT 1
= lim e +
T →∞ s s
= ∞.
which means the integral diverges.

Let s = 0, then the integral becomes
lim = lim [t]T0 = − lim T = ∞.

T →∞ T →∞ T →∞
Theorem 23. Let f1 (t) and f2 (t) have Laplace transforms and let c1 and c2
be constants, then
(i) L{f1 (t) + f2 (t)} = L{f1 (t)} + L{f2 (t)},
(ii) L{c1 f1 (t)} = c1 L{f1 (t)} and
L{c2 f2 (t)} = c2 L{f2 (t)}, equivalently,
(iii) L{c1 f1 (t) + c2 f2 (t)} = c1 L{f1 (t)} + c2 L{f2 (t)}.

Proof. (iii) LHS = L{c1 f1 (t) + c2 f2 (t)}

Z ∞
= e−st [c1 f1 (t) + c2 f2 (t)] dt
0
Z ∞
−st
c1 e f1 (t) + c2 e−st f2 (t) dt

=
Z0 ∞
= c1 e−st f1 (t)dt + c2 e−st f2 (t)dt
0
= c1 L{f1 (t)} + c2 L{f2 (t)}, by using properties of integrals.
Definition 48. A function f is said to be of exponential order if there exist

real numbers a, M and t0 such that
|f (t)| ≤ M eat for t > t0 .
Example 144. Check whether the following functions are of exponential

order.
(a) f (t) = t2 (b) f (t) = et
2
(c) f (t) = sin t (d) et .
Solution (a) Let a be any constant > 0; then
lim |f (t)|e−at = lim |t|2 e−at

t→∞ t→∞
2t 2
= lim = lim 2 at = 0,
aeat
t→∞ a e
where the last two limits are obtained by using L’Hopital’s rule. Therefore
there exists a positive constant M such that
|f (t)|e−at ≤ M or f(t) ≤ Meat ,
that is f (t) = t2 is of exponential order.

(b) Let a be any constant > 0; then
lim |f (t)|e−at = lim et e−at

t→∞ t→∞
= lim e(1−a)t → 0 for a > 1.

t→∞
Therefore we can find a and M > 0 such that |f (t)| ≤ M eat .

(c) limt→∞ |f (t)|e−at ≤ limt→∞ e−at → 0 for a > 0 implying there exists a
and M > 0 such that |f (t)| ≤ M eat .
2
(d) f (t) = et is not of exponential order since its graph grows faster than any
positive linear power of e for t > a > 0.
Now we prove the following basic existence theorem for the Laplace trans-
form of a function f .
196
TABLE 2.2: Laplace Transforms

Sr. No. f(t) L{f(t)} Sr. No. f(t) L{f(t)}
1 2k 3
1. 1 24. sin kt − kt cos kt
s (s2 + k 2 )2
1 2ks
2. t 25. t sinh kt
s2 (s2 − k 2 )2
n! s2 + k 2
3. tn n+1
26. t cosh kt
sr (s2 − k 2 )2
π eat − ebt 1
4. t−1/2 27.
√s a−b (s − a)(s − b)
π eat − ebt s
5. t1/2 28.
2s3/2 a−b (s − a)(s − b)
s k2
6. sin kt 29. 1 − cos kt 2
s2 + k 2 s(s + k 2 )
k eat − ebt s−a
7. cos kt 30. ln
s2 + k 2 t s−b
2k 2 e−sa
8. sin2 kt 2
31. H(t − a) = ua (t) , s>0
s(s + 4k 2 ) s
s2 + 2k 2
9. cos2 kt 32. δ(t) 1
s(s2 + 4k 2 )
1
10. eat 33. δ(t − t0 ) e−st
0 , t>0
s−a
k
11. sinh kt 2
34. eat f (t) F (s − a)
s − k2
s
12. cosh t 35. f (t − a)H(t − a) e−as F (s)
s2 − k 2
TABLE 2.2: Cont.
Sr. No. f(t) L{f(t)} Sr. No. f(t) L{f(t)}
2 2k 2
13. sinh kt 36. f (n) (t) sn F (s) − sn−1 f (0) · · · − f (n−1) (0)
s(s2 − 4k 2 )
s2 − 2k 2 dn
14. cosh2 kt 37. tn f (t) (−1)n F (s)
s(s2 − 4k 2 ) dsn
1 Rt
15. teat 38. 0
f (u)g(t − u)du F (s)G(s)
(s − a)2
n! sin at a
16. tn eat 39. arc tan
(s − a)n+1 t s
(n a positive integer) √
k 1 2 e−a s
at √
17. e sin kt 40. √ e−a /4t
(s − a)2 + k 2 πt s
s−a 1 s
18. eat cos kt 41. f (at) L {f }
(s − a)2 + k 2 a a
k Rt 1
19. eat sinh kt 42. 0

f (u)du L{f }(s)
(s − a)2 − k 2 s
s−a f (t) R∞
20. eat cosh kt 43. s
L{f }(x)dx
(s − a)2 − k 2 t
2ks
21. t sin kt 2
44. eat f (t) L{f }(s − a)
(s + k 2 )2
s2 − k 2 1 R T −st
22. t cos kt 45. f (t + T ) = f (t), T > 0 −sT 0
e f (t)dt
(s2 + k 2 )2 1+e
2ks2 sin at a
23. sin kt + kt cos kt 2
46. arc tan , s>0
(s + k 2 )2 t s
197
Theorem 24. Let f be a positive continuous function of exponential order

defined on [0, ∞), then its Laplace transform exists for parameters greater than
some constant a.
Proof. Since the function f is of exponential order, we know that there are
constants t0 , a and M such that
|f (t)| ≤ M eat for t > t0
−at
or e |f (t)| ≤ M for t > t0
−at
or |e f (t)| ≤ M for t > t0
−at
as |e f (t)| = |e−at ||f (t)| = e−at |f (t)|.
It may be noted that e−at is always positive. Multiplying by e−st+at , we have
|e−st f (t)| ≤ M e−st eat .
Hence
Z ∞ Z ∞
|e−st f (t)|dt ≤ M e−(s−a) dt
0 0
∞
−M e−(s−a)

=
s−a 0
M e−(s−a)t M
= − lim + .
t→∞ s−a s−a
Since the first term is zero for s > a, we have
Z ∞
M
|e−st+at dt ≤ , s>a
0 s−a
which implies the existence of the improper integral defining the Laplace trans-
form of f and completes the proof.
Corollary: Let f be a piecewise continuous function of exponential order
defined on [0, ∞). Then lims→∞ F (s) = L(f (t)) = 0.
Proof.
Z ∞
−st

|L(f (t))| =
e f (t)dt
Z 0∞
≤ |e−st f (t)|dt
0
M
≤ , s > a as seen in the proof of Theorem 24.
s−a
Taking the limit as s → ∞ we get
lim F (s) = lim |L(f (t))| = 0.
s→∞ s→∞
2.7.2 Translation Theorems

The following are called shifting theorems.
Theorem 25. (The First Shifting Theorem):
Let L{f (t)} = F (s). Then
L{eat f (t)} = F (s − a).
Proof. By definition of the Laplace Transform, we write

Z ∞
at
L{e f (t)} = e−st (eat f (t))dt
0
Z ∞
= e−(s−a)t f (t)dt
0
= F (s − a).
Theorem 26. (The Second Shifting Theorem):

Let eat f (t) = F (s). Then
L{H(t − a)f (t − a)} = e−as F (s)
where H is a Heaviside function defined as

0 if t < 0
H(t) = .
1 if t ≥ 0
R∞
Proof. L{H(t − a)f (t − a)} = 0
e−st H(t − a)f (t − a)dt
Z ∞
= e−st f (t − a)dt
a
because H(t − a) = 0 for t < a and H(t − a) = 1 for t ≥ a. Now let u = t − a

in the last integral. We get
Z ∞
L{H(t − a)f (t − a)} = e−s(a+u)t f (u)du
0
Z ∞
−as
= e e−st f (u)du
0
= e−as L{f (u)} = e−as F (s).
Example 145. Apply the first shifting theorem to find

(a) L{e3t sin t}.
(b) L{e−t g(t)}, where

2t 0≤t<3
g(t) = .
−1 t≥3

−1 4
(c) L 2
.
s − 4s + 20
1
Solution: (a) Since L{sin t} = , it follows that by Theorem 25
s2 + 1
1
L{e3t sin t} = .
(s − 3)2 + 1
(b) By Theorem 25
L{e−t g(t)} = F (s + a)
where L{g(t)} = F (s).
Z ∞
F (s) = e−st g(t)dt
0
Z 3 Z ∞
= e−st g(t)dt + e−st g(t)dt
0 3
Z 3 Z ∞
= e−st 2tdt − e−st dt
0 3
3 Z 3 ∞
−st 1 −st 1 1 −st
=2 e − t −2 e − dt − − e
s 0 0 s s 3
2 2e−3s 7e −3s
= − −
s2 s2 s
2 2e−3(s+1) 7e−3(s+1)
Therefore L{e−t g(t)} = − − .
(s + 1)2 (s + 1)2 s+1
4 4
(c) We have 2 = .
s + 4s + 20 (s + 2)2 + 6
4
F (s + 2) = .
(s + 2)2 + 16
This means we should choose
4
F (s) = .
s2 + 16
By the first shifting Theorem 25,
L{e−2t sin 4t} = F (s − (−2)) = F (s + 2)

4
=
(s + 2)2 + 16
and therefore
−1 4
L = e−2t sin 4t.
(s + 2)2 + 16
−3s
−1 se
Example 146. Compute L .
s2 + 4
Solution: By Theorem 26
L{H(t − a)f (t − a)} = e−as F (s)

or H(t − a)f(t − a) = L−1 {e−as F (s)}
s
F (s) =
s2 + 4

s
L−1 {F (s)} = L−1 implies that f (t) = cos 2t.
s2 + 4
Therefore,
se−3s

L−1 = H(t − 3) cos 2(t − 3).
s2 + 4
2.7.3 Inverse Laplace Transform

In the previous section we have seen the method for finding the Laplace
transform. In this section we discuss the method for reversing the process
of the previous section and we reconstruct a function f (t) whose Laplace
transform F (s) is given.
Definition 49. Let f (t) be a function such that L{f (t)} = F (s), then f (t)
is called the inverse Laplace transform. The inverse Laplace transform is des-
ignated as L−1 and we write
f (t) = L−1 {F (s)}.
In order to find an inverse transform we must be familiar with the formulas

for finding Laplace Transforms, see Table 2.2. One should learn to use this
table in reverse. However in general the given Laplace transform will not be
in the form that allows direct use of the table, so the given F (s) have to be
algebraically manipulated in a form that can be found in the table. The most
relevant result for this purpose is the linearity property of the inverse Laplace
transform which states that
L−1 {c1 F1 (s) + c2 F2 (s)} = c1 L−1 {F1 (s)} + c2 L−1 {F2 (s)}
where c1 and c2 are constants.

The proof of this result follows from definition of the inverse Laplace trans-
form and the corresponding linearity of the Laplace transform.
Example 147. Find

−1 1
(i) L .
s+2

−1 1
(ii) L .
s2 + 2

−1 −2s + 6
(iii) L .
(s2 + 4)

−1 s+5
(iv) L .
(s2 − 2s − 3

−1 s+1
(v) L .
(s2 − 4s)

−1 2s − 6
(vi) L .
s2 + 9

−1 2s − 5
(vii) L .
(s − 1)3 (s2 + 4)
1
Solution: (i) From Table 2.2, let L(eat ) = . Choosing a = −2 we get
s−a
1
L(e−2t ) = and consequently by the definition of the inverse Laplace
s+2
transform
1
L−1 = e−2t
s+2
√
(ii) By Table 2.2, for k = 2 and the linearity of the Inverse Laplace Transform
we get
( √ )
−1 1 1 −1 2
L = √ L
s2 + 2 2 s2 + 2
1 √
= √ sin 2t.
2

−1 −2s + 6 −1 −2s 6
(iii) As L =L +
(s2 + 4) (s2 + 4) (s2 + 4)

−1 s −1 1
= −2L + 6L .
(s2 + 4) (s2 + 4)
By linearity of the inverse transform,

6 2
= −2 cos 2t + L−1
2 (s2 + 4)
= −2 cos 2t + 3 sin 2t by Table 2.2 (items 6 and 7).
(iv) Since s2 − 2s − 3 = (s − 3)(s + 1) we get

s+5 s+5 A s−3
= = +
s2 − 2s − 3 (s − 3)(s + 1) s−3 s+1
where constants A and B are to be determined.
A s−3 A(s + 1) + (s − 3)B
+ =
s−3 s+1 (s − 3)(s + 1)
(A + B)s + (A − 3B)
= .
(s − 3)(s + 1)
We can write
s+5 (A + B)s + (A − 3B)
= .
(s − 3)(s + 1) (s − 3)(s + 1)
This implies that
s + 5 = (A + B)s + (A − 3B).
This gives A + B = 1 and A − 3b = 5.
Subtracting second from first we get B = −1. Putting this value in the first
equation we get A = 2. Therefore, we have

s+5 2 −1
L−1 = L −1
+
(s2 − 2s − 3 s−3 s+1

1 1
= 2L−1 − L−1 ,
s−3 s+1
using linearity of L−1
By Table 2.2 (item 10, for a = 3 and a = −1) we get

−1 1 −1 1
L 3t
= e and L = e−t .
s−3 s+1

s+5
Hence L−1 h = 2e3t − e−t .
(s2 − 2s − 3)

−1 s+1 −1 11 5 1
(v) L =L − +
(s2 − 4s) 4s 4s−4

1 1 5 1
= − L−1 +
4 s 4 s−4
1 5
= − + e4t
4 4
(by Table 2.2, items 1 and 10).

2s − 6 s 3
(vi) L−1 = L −1
2. − 2.
s2 + 9 s2 + 9 s2 + 9
= 2 cos 3t − 2 sin 3t
2s − 5
(vii) Given F (s) = , decompose F (s) into partial fractions as
(s − 1)3 (s2 + 4)
follows
2s − 5 A B C Ds + E
3 2
= + 2
+ 3
+ 2 .
(s − 1) (s + 4) s − 1 (s − 1) (s − 1) s +4
By comparison, we find that the coefficients are

−17 16 −3 17 −63
A= , B= , C= , D= , E= .
125 25 5 125 125
Using shifting property, (Theorem 27) and the Laplace transforms of 1, t, t2 ,
sin 2t and cos 2t, we have
17 t 16 t 3 17 63
L−1 {F (s)} = − e + te − t2 et + cos 2t − sin 2t
125 25 10 125 250
1
= (−34et + 160tet − 75t2 et + 34 cos 2t − 63 sin 2t).
250
2.7.4 Step and Impulse Functions

In engineering literature one often encounters functions that are discontin-
uous (off or on) or quite large in very small intervals. An important function
representing such a situation is called the Heaviside function or unit step func-
tion. Oliver Heaviside was an electrical engineer who introduced this function.
He lived from 1850 to 1925 and developed methods to solve differential equa-
tions by applying Laplace transforms.
Definition 50. The Heaviside function denoted by H is defined by

0, t < 0
H(t) =
1, t ≥ 0
Remark 32. H(t − a) or ua (t) is another notation for the Heaviside function
defined by
0, 0 ≤ t < a
H(t − a) = .
1, t ≥ a
The graph of the Heaviside function y = H(t − 3) is shown in Figure 2.14.
Remark 33. (i) It may be observed that H(t − a)f (t − a) is simply a trans-
lation of a function f (t), that is,

0, 0≤t<a
H(t − a)f (t − a) = .
f (t − a), t ≥ a
f(t)
06
0.4
02
2 3
FIGURE 2.14: Heavyside Function H(t − 3)
(ii) An external force acting on a mechanical system or a voltage impressed on

a circuit turned off after a period of time can be represented by the Heaviside
function.
Example 148. Find the Laplace transform of the following functions
(i) H(t − 8).
(ii) t + H(t − 2)(t − 2)2 .
(iii) t − (t − 1)H(t − 1).
R∞
Solution: (i) L{H(t − 8)}(s) = 0 H(t − 8)e−st ds
Z 8 Z ∞
−
= H(t − 8)e sds + H(t − 8)e−s ds
0 8
∞
e−8s
Z
= e−s ds = .
8 s
(ii) L{t + H(t − 2)(t − 2)2 } = L{t} + L{H(t − 2)(t − 2)2 }
Z 2
1
= + H(t − 2)(t − 2)2 e−st dt
s2 0
Z ∞
+ H(t − 2)(t − 2)2 e−st dt
2
Z ∞
1
= + 0 + (t − 2)2 e−st dt
s2 2
1
= + e−2s (L{t2 })(s)
s2
1 2
= + e−2s 3
s2 s
s + e−2s
= .
s3
(iii) Using translation property (Theorem 26), we have
1 e−s
F (s) = − − .
s2 s2
Example 149. Find the inverse Laplace transform of the function
1 e−5s
F (s) = + .
s4 s4
−5s
1 −1 e
Solution: f (t) = L−1 {F (s)} = L−1 { } + L { }
s4 s4
t3 (t − 5)3
= + H(t − 5) , applying Theorem 26.
3! 3!
Unit Impluse: Mechanical systems are often impacted by external forces of
large magnitude that act only for very short times. One example is electro-
magnetic force acting within an electrical circuit. A vibrating airplane wing
may be struck by lightning. A spring may receive a sharp blow from a mass.
A baseball, golf ball, or tennis ball can soar into the air when struck violently
by a bat, club, or racket. Such impacts are often called generalized functions
or distributions.
Note, however, that these impacts are not functions in the classical sense
(designated Dirac delta functions to honor P.A.M. Dirac (1902-1984, recipient
of the Nobel Prize in physics in 1933). Dirac supervised another Nobel lau-
reate, Abdus Salam who won the physics prize in 1979 and founded and led
UNESCOs International Centre of Theoretical Physics in Trieste, Italy.
The Dirac delta function introduced in 1930 contradicted the well known
Lebesgue integral theory used by engineers and physicists for 20 years. In
1950, Laurent Schwartz, a French mathematician, developed a rigorous math-
ematical theory of generalized functions. Known as the distribution theory,
it provided logical foundations for delta function techniques. Every piecewise
continuous function is a distribution. However, the delta function defined be-
low is an example of a distribution that is not a function in the classical
sense.
1
δ = [H(t) − H(t − )]
2
For any positive number , the impulse δ is defined by:
δ(t) = lim δ (t)

→0+
and δ(t − a) = lim δ (t − a), a > 0
→0+
( → 0+ means tends to zero from the right hand side).

Definition 51. (a) The Dirac delta function concentrated at t = 0, denoted

by δ(t) is defined as
δ(t) = 0, t 6= a
Z ∞
δ(t) = 1.
−∞
(b) The Dirac delta function, concentrated at a point t = a, denoted by δ(t−a)

is defined as
δ(t − a) = 0, t 6= 0
Z ∞
δ(t − a) = 1.
−∞
Example 150. Find the Laplace transforms of the following functions: (a)
Dirac delta function δ(t − a).
(b) δ(t − π).
π
(c) sin t.H2π (t) + δ(t − ).
2
Solution: (a) L{δ(t − a)} = L{lim→0+ δ (t − a)}
Z ∞
1
= lim+ {H(t − a) − H(t − a − )} dt
→0
0
1 1 −as 1 −(a+)s
= lim e − e
→0+ s s

1 −as 1
e − lim→0+ e−(a+)s
s s
=

e−as (1 − e−s )
= lim+ = e−as
→0 s
1 − e−s
(as lim→0+ = 1). If a = 0, L{δ(t)} = e−0.s = 1.
s
(b) We know L{δ(t − a)} = e−sa and so L{δ(t − π)} = e−sπ .
(c) Since sin t is periodic with period 2π, we have
π π
sin tH2π (t) − δ t − = sin(t − 2π)H2π (t) − δ t − .
2 2
Therefore, by the translation property Theorem 26 and by part (a) we get
desired result.
2.7.5 Some Additional Properties

The Laplace transforms of the derivatives of a differentiable function exist
under appropriate conditions. In this section we discuss results that are quite
useful in solving differential equations. For solving second order differential
dy d2 y
equations we need to evaluate the Laplace transform of and 2 .
dt dt
Let f (t) be differentiable for t ≥ 0 and let its derivative f 0 (t) be continuous.
By applying the formula for integration by parts we find that
L{f 0 (t)} = sF (s) − f (0). (2.95)
Verification: By definition
Z ∞
L{f 0 (t)} = e−st f 0 (t)dt
0
Z ∞
= e−st f 0 (t) + s e−st f (t)dt

0
= −f (0) + sL{f (t)}
= sF (s) − f (0).
Here we have used the fact that
lim e−st f (t) = 0.
t→∞
Similarly for a twice differentiable function f (t) such that f 0 (t) is continuous
we can prove that
L{f 0 (t)} = s2 F (s) − sf (0) − f 0 (0). (2.96)
In fact we can prove the following theorem by repeatedly applying integration
by parts.
Theorem 27. Let f, f 0 , . . . , f (n−1) be continuous on [0, ∞) and of exponential
order and if f (n) (t) be piecewise continuous on [0, ∞), then
L{f (n−1) (t)} = sn F (s) − sn−1 f (0) − sn−2 f 0 (0) · · · − f n−1 (0). (2.97)
Theorem 27 can be used to generate a formula for the Laplace transform
of the indefinite integral of a function f . We then have the following theorem.
Theorem 28. Let f be piecewise continuous and of exponential order for
t ≥ 0, then Z t
1 1
L f (u)du = L{f (t)} = L{F (s)}.
0 s s
Rt
Proof. Let g(t) = 0 f (u)du. Then g 0 (t) = f (t) and g(0) = 0.
Furthermore, g(t) is of exponential order. By Theorem 27
L{g 0 (t)} = sL{g(t)} − g(0)
Z t
or L{f(t)} = sL{ f (u)du}
0
Z t
1
L{ f (u)du} = L{f (t)}.
0 s
00
Example 151. (a)Using the
Laplace transform of f , find L{sin kt}.
1 t
(b) Show that L−1
R
F (s) = 0 f (u)du.
s
Solution: (a) Let f (t) = sin kt, then f 0 (t) = k cos kt, f 0 (t) =
−k 2 sin kt, f (0) = 0 and f 0 (0) = k. Therefore
L{f 00 (t)} = s2 F (s) − sF (0) − f 0 (0)

or L{f 00 (t)} = s2 F (s) − k
or L{−k2 sin kt} = s2 F (s) − k
where F (s) = L{f (t)} = L{sin kt}
L{−k 2 sin kt} = s2 F (s) − k = s2 L{sin kt} − k.
Solving for L{sin kt} we get
k
L{sin kt} =
s2 + k2
(−k 2 L{sin kt} = s2 L{sin kt} − k or L{sin kt}(s2 + k2 ) = k).
(b) By Theorem 28,
Z t
1
L{ f (u)ds} = F (s).
0 s

Rt 1
This implies that 0 f (u)ds = L−1 F (s) .
s
Derivative of Laplace Transform
Theorem 29. Let f (t) be piecewise continuous and of exponential order over
each finite interval, and let
L{f (t)} = F (s).
Then F (s) is differentiable and
F 0 (s) = L{−tf (t)}.
Proof. Suppose that |f (t)| ≤ M eat , t > 0 and take any s0 > a. Then consider
∂ −st
e f (t) = −tf (t)e−st .

∂s
Choose > 0 such that s0 > a + . Then we have |t|, et for all t large enough
since in fact s
lim = 0.
t−→∞ eat
Thus |tf (t)| ≤ M e(a+)t , for all large t and we find that tf (t) is also of
exponential order and Z ∞
tf (t)e−st dt
0
exists by Theorem 24, that is, the integral converges uniformly. Hence F (s) is
differentiable at s0 and
Z ∞
∂ −st
F (s0 ) = e f (t)dt dt at s = s0 .
0 ∂s
Therefore
Z ∞
F 0 (s) = − tf (t)e−st dt
0
= L{−tf (t)} for all s > a.
Remark 34. It can be checked that F (s) = L{f (t)} and n = 1, 2, 3, . . . ; then
dn
L{tn f (t)} = (−1)n F (s).
dsn
Definition 52. (Convolution). Let f and g be piecewise continuous functions
for t ≥ 0. Then the convolution of f and g denoted by f ∗ g, is defined by the
integral
Z t
(f ∗ g)(t) = f (u)g(t − u)du
0
Z t
= g(u)f (t − u)du
0
= (g ∗ f )(t).
Theorem 30. (Convolution theorem). Let f and g be piecewise continuous

and of exponential order for t ≥ 0; then the Laplace transform of f ∗ g is given
by the product of the Laplace transform of f and the Laplace transform of g.
That is
L{f ∗ g} = F (s)G(s).
Proof. Let F = L{f } and G = L{g}. Then
Z ∞
F (s)G(s) = F (s) e−st g(t)dt
Z ∞ 0
= F (s)e−su g(u)du
0
in which we changed variables of integration from t to u and brought F (s)

inside the integral. Let us recall that
e−su F (s) = L{H(t − u)f (t − u)}
where F (s) = L{f (t)} and H(.) is the Heaviside function, see Theorem 26.
Substitute this into the integral for F (s)G(s) to get
Z ∞
F (s)G(s) = L{H(t − u)f (t − u)}g(u)du. (2.98)
0
But, from the definition of the Laplace transform,

Z ∞
L{H(t − u)f (t − u)} = e−st {H(t − u)f (t − u)}dt.
0
Substituting this into (2.98) we get

Z ∞ Z ∞
−st
F (s)G(s) = e H(t − u)f (t − u)dt g(u)du
Z0 ∞ Z ∞
0
= e−st g(u)H(t − u)f (t − u)dtdu.

0 0
Let us recall that

0, if 0 ≤ t < u
H(t − u) = .
1, if t ≥ u
Therefore,
Z ∞ Z ∞
F (s)G(s) = e−st g(u)H(t − u)f (t − u)dtdu.
0 0
Figure 2.15 show t − s plane.

The Laplace integral is over the shaded region, consisting of points satisfying
FIGURE 2.15: t − S Plane

0 ≤ u ≤ t < ∞. Reversing the order of integration gives us

Z ∞Z ∞
F (s)G(s) = e−st g(u)f (t − u)dtdu
0 0
Z ∞ Z ∞
−st
= e g(u)f (t − u)du dt
Z0 ∞ 0
= e−st (f ∗ g)(t)dt
0
= L{f ∗ g}.
It follows immediately from Theorem 30 that:

Theorem 31. Let L−1 (F ) = f, L−1 (G) = g. Then
L−1 {F G} = f ∗ g.

1
Example 152. (a) Evaluate L−1 .
(s2 + 25)2
(b) Let f (t) = e−t and g(t) = sin t. Show that
(i) f ∗ g = g ∗ f and (ii) L{f }L{g} = (L{f ∗ g})(s).
1
Solution: Let F (s)G(s) = , so that
s2 + 52

1 1 1
f (t) = g(t) = L−1 = sin 5t.
5 s2 + 52 5
By the convolution theorem
Z t
1 1
L−1 = sin ku sin k(t − u)du
(s2 + 52 )2 25 0
Z t
1
= [cos k(2u − t) − cos 5t]du
2.25 0
t
1 1
= sin k(2u − t) − u cos kt
50 2.5 0
sin 5t − 5t cos 5t
= .
2(5)3
Z t
(b) (i) (f ∗ g)(t) = f (t − x)g(x)dx
0
Z t
(f ∗ g)(t) = f (t − x)g(x)dx
0
Z t
(f ∗ g)(t) = f (t − x)g(x)dx
0
Z t
= e−(t−x) sin xdx
0
Z t
= e−t ex sin xdx
0
t
ex

1
= e−t (sin x − cos x) + e−t
2 0 2
1 1
= (sin t − cos t) + e−t .
2 2
On other hand,
Z t
(g ∗ f )(t) = g(t − x)f (x)dx
0
Z t
= sin(t − x)ex dx
0
1 1
= (sin t − cos t) + e−t .
2 2
So, f ∗ g = g ∗ f.
1 1
(b) (ii) We know that L{e−at } = , and L{sin t} = 2 .
s+1 s +1
1 1
Thus F (s)G(s) = 2
, or
s+1s +1

−1 −1 1 1
L {F (s)G(s)} = L .
s + 1 s2 + 1

−1 1 1 1 −s + 1
=L +
2 s + 1 2 s2 + 1

1 −1 1 1 −1 s 1 −1 1
= L − L + L
2 s+1 2 s2 + 1 2 s2 + 1
1 1
= e−t − cos t + sin t
2 2
= (f ∗ g)(t)
or F (s)G(s) = L{(f ∗ g)(t).
2.7.6 Application to Differential and Integral Equations

In this section we discuss applications of the Laplace transform and related
methods in finding solutions of differential equations with initial conditions
and integral equations. In view of discussion in Section 2.7.4, the most im-
portant feature of the Laplace method in that the initial value given in the
problem is naturally incorporated into the solution process through Theorem
27 and Equations (2.95) and (2.96). One advantage of this method is that we
need not find the general solution first, then solve for the constant to satisfy
the initial condition.

General procedure of Laplace method for solving initial value prob-
lems.
Essentially a Laplace transform converts an initial value problem to an alge-
braic problem, incorporating initial conditions into the algebraic manipula-
tions. There are three basic steps:
(i) Find the Laplace transforms of both sides of the given differential equa-
tion, making use of the linearity property of the transforms.
(ii) Solve the transformed equation for the Laplace transform of the solution
function.
(iii) Find the inverse transform of the expression F (s) found in step (ii).
Example 153. Apply the Laplace transform to solve the initial value problem
(i) y 0 + 2y = 0, y(0) = 1.
dy
(ii) − y = 1, y(0) = 0.
dx
Solution : Given y 0 + 2y = 0, taking the Laplace transforms of both sides of
this equation yields
L{y 0 + 2y} = L{0}
or L{y0 } + 2L{y} = L{0}
(by linearity of the Laplace transform).
Let L{y(t)} = Y (s) and by applying Equation (2.95), the previous equation
takes the form
sY (s) − 1 + 2Y (s) = 0.
Solving for Y (s) we have
1
Y (s) = .
s+2
The function y(t) is then found by taking the inverse transform of this equa-
tion. Thus
1
y(t) = L−1 = e−2t (see Table 2.2).
s+2
(b) Let L{y} = Y (s). Taking the Laplace transform of the initial value prob-
lem we have
1
sY (s) − y(0) − Y (y) = .
s
Solving for Y (s) we obtain
1 1
Y (s) = + .
s s−1
Thus y = −1 + et .
Example 154. Apply the Laplace transform to solve the initial value problem
dy
(i) 2 = −y, y(0) = −3.
dx
(ii) y 0 − 4y = 1, y(0) = 1.
Solution : The Laplace transform of the initial value problem gives us
2sY (s) − 2y(0) + Y (s) = 0.
Solving for Y (s) we get

6 3
Y (s) = = .
2s + 1 1
s+
2
Taking the inverse Laplace transform of this equation we obtain
y(t) = 3e−t/2
as the solution of the given initial value problem.

Let L{y(t)} = Y (s). Taking the Laplace transform of the differential equation,
using the linearity of L and Equation (2.95) we get
L{y 0 − 4y} = L{1}

1
or L{y0 } − 4L{y} =
s
1
or sY(s) − y(0) − 4Y(s) =
s
1
or sY(s) − 1 − 4Y(s) =
s
1
or Y(s)(s − 4) = 1+
s
1 1
or Y(s) = + .
s − 4 s(s − 4)
Taking the inverse Laplace transform of this equation we have

1 1
L−1 {Y (s)} = L−1 + .
s − 4 s(s − 4)
By the linearity of L−1 , we get

1 1
L−1 {Y (s)} = L−1 + L−1 .
s−4 s(s − 4)
By Table 2.2

−1 1
L = e4t
s−4

−1 1 1 −1 1 1
L = L −
s(s − 4) 4 s−4 s

1 −1 1 1 −1 1
= L − L
4 s−4 4 s
1 4t 1
= e − .
4 4
Thus
1 1
y(t) = e4t + eat −
4 4
5 at 1
= e −
4 4
is the solution of the given initial value problem.
Example 155. (i) y 0 + 6y = e4t , y(0) = 2.
(ii) y 0 + 7y = H(t − 2), y(0) = 3.
(iii) y 00 − 2y 0 + y = 3δ(t − 2), y(0) = 0, y 0 (0) = 1.
Solution : The Laplace transform of the initial value problem yields
1
sY (s) − y(0) + 6Y (s) = .
s−4
1 2 1 1 19 1
Y (s) = + = . + . .
(s − 4)(s + 6) s + 6 10 s − 4 10 s + 6
The inverse Laplace transform gives the solution of the initial value problem
1 4t 19 −6t
y(t) = e + e .
10 10
(ii) Taking Laplace transforms of both sides of the equation and using the
property of the Laplace transform of higher order derivative, we get
1
sY (s) − 3 + 7Y (s) = e−2s .
s
Solving for Y (s), we get
1 3
Y (s) = e−2s +
s(s + 7) s + 7
1 −2s 1 1 −2s 1 3
= e − e + .
7 s 7 s+7 s+7
Using inverse Laplace transform we obtain

1 1
y(t) = H(t − 2) − e−7(t−2) H(t − 2) + 3e−7t .
7 7
(iii) Taking Laplace transforms of the given equation and using the properties
of higher order derivatives, we obtain
s2 Y (s) − 2 − 2sY (s) + Y (s) = 3e−2s .

2 1 1 1
Y (s) = + 3e−2s 2 = + 3e−2s .
s2 − 2s + 1 s − 2s + 1 (s − 1) 2 (s − 1)2
Taking the inverse Laplace transform we obtain
y(t) = tet + 3(t − 2)et−2 H(t − 2).
Example 156. Solve y 00 + 4y 0 + 3y = et , y(0) = 0, y 0 (0) = 2.
Solution : Take the Laplace transform of the given differential equation to

get
L{y 00 } + L{4y 0 } + L{3y} = L{et }.
Using Equations (2.95) and (2.96) and applying initial conditions we get
L{y 00 } = s2 Y (s) − sy(0) − y 0 (0) = s2 Y (s) − 2

and L{y0 } = sY (s) − y(0) = sY (s).
Therefore,
1
s2 Y (s) − 2 + 4sY (s) + 3Y (s) = .
s−1
Solving this for Y (s), we get
2s − 1
Y (s) = .
(s − 1)(s2 + 4s + 3)
2s − 1 A B C
Let = + + .
(s − 1)(s2 + 4s + 3) s−1 s+1 s+3
The equation can hold only if,for all s,
A(s + 1)(s + 3) + B(s − 1)(s + 3) + C(s − 1)(s + 1) = 2s − 1.
Now choose values of s to simplify the task of determining A, B and C.

1
Let s = 1 to get 8A = 1, so A = .
8
3
Let s = −1 to get −4B = −3, so B = .
4
7
Choose s = −3 to get 8C = −7, so C = − .
8
Then
1 1 3 1 7 1
Y (s) = . + . − . .
8 s−1 4 s+1 8 s+3
By Table 2.2 we find that

−1 1 −1 1 3 −1 1 7 −1 1
L {Y (s)} = L + L − L
8 s−1 4 s+1 8 s+3
1 t 3 −t 7 −3t
y(t) = e + e − e .
8 4 8
This is the solution of the given initial value problem.
Example 157. Find L{f (t) ∗ g(t)}, where f (t) = e−t and g(t) = sin 2t.
Solution: By Theorem 30, we have
L{f (t) ∗ g(t)} = F (s)G(s)

Z ∞
1
where F(s) = e−st e−t dt = (by Table 2.2)
s+1
0
2
G(s) = 2
(by Table 2.2).
s +4
Thus

1 2
L{f (t) ∗ g(t)} =
s+1 s2 + 4
2
= .
(s + 1)9s2 + 4
nR o
t
Example 158. Evaluate L 0 eu sin(t − u)du .
Rt
Solution : 0 eu sin(t − u)du = f (t) ∗ g(t), where f (t) = et and g(t) = sin t
(by definition of convolution). By Theorem 30 we get
L{f (t) ∗ g(t)} = L{f (t)}L{g(t)}

1 1
= 2
(by Table 2.2)
s+1s +1
1
= .
(s − 1)(s2 + 1)
An equation involving an unknown function f (t), known functions g(t) and

h(t) and integral of f and g is called a Volterra integral equation for f (t):
Z t
f (t) = g(t) + f (u)h(t − u)du.
0
Example 159. Solve the following Volterra integral equation for f (t):
Z t
f (t) = 3t2 − e−t − f (u)et−u du
0
Solution : We identify h(t − u) = et−u so that h(t) = et .

Using Laplace transforms of both sides, we have
Z t
2 −t t−u
L{f (t)} = L{3t } − L{e } − L f (u)e du
0
2
L{f (t)} = F (s), L{3t2 } = 3L{t2 } = 3
s3
1
L{e−t } =
s+1
Z t
t−u
L f (u)e du = L{f (t) ∗ h(t)}
0
= L{f (t)}L{h(t)}.
By Definition 52 and Theorem 30
Z t
L f (u)et−u du = L{f (t)}L{h(t)}
0
1
= F (s). .
s−1
Therefore,
6 1 1
F (s) = − − F (s)
s3 s+1 s+1
−s3 + 6s + 6

1
F (s) 1 + =
s−1 s3 (s + 1)
−s3 + 6s + 6

s
F (s) =
s−1 s3 (s + 1)
6 6 1 2
F (s) = − 4+ −
s3 s s s+1
by carrying out the partial fraction decomposition. The inverse problem then
gives

−1 2! −1 3! −1 1 −1 1
f (t) = 3L −L +L − 2L .
s3 s4 s s+1
= 3t2 − t3 + 1 − 2e−t
Example 160. Find f (t) such that
Z t
f (t) = 2t2 − e−t − f (t − u)e−u du.
0
Solution : It is clear that

Z t
f (t) ∗ g(t) = f (t − u)e−u du
0
and by Theorem 30
L{f (t) ∗ g(t)} = L{f (t)}L{g(t)}

1
= F (s) .
s+1
By using the Laplace transforms of both sides of the integral equation we get
1
L{f (t)} = L{2t2 } + F (s)
s+1
2 1
F (s) = 2. + F (s)
s3 s+1

s 4
F (s) =
s+1 s3
4(s + 1)
F (s) =
s4
4 4
= 3
+ 4.
s s
With the inverse Laplace transform we get

−1 2 2 −1 3! 2
f (t) = 2L 3
+ L 4
= 2t2 + t3 .
s 3 s 3
Example 161. Find the function f (t) if

Rt
(i) f (t) = t + 0 f (u) sin(t − u)du.
Rt
(ii) f (t) = 4t − 3 0 f (x) sin(t − x)dx.
Solution : We can identify the integral as
f (s) ∗ h(t), where h(t) = sin t.
Taking the Laplace transform of the both sides of the integral equation we get
L{f (t)} = L{t} + L{f (t) ∗ h(t)}.
By Theorem 30,
L{f (t) ∗ h(t)} = L{f (t)}L{h(t)}

1
= F (s) 2 .
s +1
Thus
1 1
F (s) =
2
+ F (s) 2
s s +1
s2 1
or F(s) 2 =
s +1 s2
s2 + 1 1 1
or F(s) = 4
= 2 + 4.
s s s
Taking the inverse Laplace transform of this equation we get
1
f (t) = t + t3 .
6
Rt
(ii) Let F = L{f }. The equation f (t) = 4t − 3 0 f (x) sin(t − x)dx can be
written as
f (t) − 4t = −3(f (t) ∗ sin t)
Applying Laplace transforms to both sides of this equation and using the
convolution Theorem 30, we get
s2 + 1 1 1
F (s) = 8 =6 2 + 2. 2 .
s2 (s2+ 4) s +4 s
f (t) = 3 sin 2t + 2t.
Example 162. Solve y 00 − 6y 0 + 9y = t2 e3t .
Solution : Applying the Laplace transform to both sides and using linearity
we get
L{y 00 ) − 6L{y 0 } + 9L{y} = L{t2 e3t }
2
or sY(s) − sY(0) − y0 (0) − 6[sY(s) − y(0)] + 9Y(s) =
(s − 3)2
where Y (s) = L{y(t)} (Using Laplace transform of derivatives).
2
(s2 − 6s + 9)Y (s) = 2s + 5 +
(s − 3)2
2s + 5 2
Y (s) = +
(s − 3)2 (s − 3)5
2 11 2
Y (s) = + +
s − 3 (s − 3)2 (s − 3)5
(decomposing right hand side into partial fractions). Taking the inverse
Laplace transform of this equation we get

1 1 1
y(t) = 2L−1 + 11L−1 + 2L−1
s−3 (s − 3)2 (s − 3)5
1 4 3t
or y(t) = 2e3t + 11te3t + t e .
12
Example 163. Solve y 00 + y = 4δ(t − 2π), y(0) = 1 and y 0 (0) = 0.
Solution : Taking the Laplace transform to both sides, we get
L{y 00 } + L{y} = 4L{δ(t − 2π)}

0
2
s Y (s) − sY (0) − y (0) + Y (s) = 4e−2πs
(s2 + 1)Y (s) − s = 4e−2πs
s e−2πs
Y (s) = + 4 .
s2 + 1 s2 + 1
y(t) = cos t + 4 sin(t − 2π)H(t − 2π).
Since sin(t − 2π) = sin t, the solution can be written as

cos t, 0 ≤ t ≤ 2π
y(t) = .
cos t + 4 sin t, t ≥ 2π
Example 164. Solve the initial value problem y 00 −5y 0 +6y = H(t−1), y(0) =
0, y 0 (0) = 1.
Solution: Applying the Laplace transform to both sides, we get
e−s
s2 L{y} − sY (0)0 − y 0 (0) − 5[sL{y} − y(0)] + 6L{y} = .
s
Solving for L{y} and taking the inverse Laplace transform we get
e−s
(s2 − 5s + 6)Y (s) = +1
s
e−s 1
or Y(s) =+1+
s (s − 2)(s − 3)

1 1 1 1 1 1
L−1 − + = − e2t + e3t
6 2(s − 2) 3(s − 3) 6 2 3
(by decomposition into partial fractions)

1 1 2(t−1) 1 3(t−1)
y= − e + e H(t − 1) + e3t − e2t .
6 2 3
Example 165. Solve the initial value problem y 00 + 4y 0 = sin tH(t −
2π), y(0) = 1, y 0 (0) = 0.
Example 166. Solve the initial value problem y 00 + 4y = sin 3t, y(0) =
y 0 (0) = 0.
2.8 Series Solution of Differential Equations

We have seen in Section 2.5 that we can solve linear differential equations
of order two or more with constant coefficients. The Cauchy-Euler equation
is an exception. In fact most linear differential equations of higher order with
variable coefficients cannot be solved in terms of elementary functions. The
usual strategy for solving such equations is to assume a solution in the form of
an infinite series and proceed in a manner similar to the method of undeter-
mined coefficients (Subsection 2.5.6). Since these series solutions often turn
out to be power series, it is appropriate to summarize properties of power
series in the first subsection essential for proper understanding of discussion
on Bessel’s and Legendre’s equations.
2.8.1 Review of Properties of Power Series

A power series in (x − a) is an infinite series of the form
∞
X
2
c0 + c1 (x − a) + c2 (x − a) + · · · = cn (x − a)n . (2.99)
n=0
Series of (2.99) is also called a power series centered at a. The power series
centered
PN at a = 0 is often referred as the power series, that is, the series
n
c
n=0 n x . A power series centered at a is called convergent
P∞ at a specified
value of x if its sequence of partial sums SN (x) = n=0 cn (x − a)n , that is,
{SN (x)} is convergent. In other words the limit of {SN (x)} exists. If the limit
does not exist the power series is called divergent. The set of points x, at
which the power series is convergent, is called the interval of convergence
of the power series. P∞
For R > 0, a power series n=0 cn (x − a)n converges if |x − a| < R and
diverges if |x − a| > R. If the series converges only at a then R = 0, and if it
converges for all x then R = ∞, |x−a| < R is equivalent to a−R < x < a+R.
A power series may or may not converge at the end points a − R and a + R
of this interval. R is called the radius of convergence. P∞
A power series is called absolutely convergent if the series n=0 |cn (x−
a)|n converges. A power series converges absolutely within its interval of con-
vergence. By the ratio test, a power series
centered at a, (2.99) is absolutely
cn+1
convergent if L = |x−a| limn=∞ is less than 1, that is, L < 1, the series
cn
divergesPif L > 1; and test fails if L = 1. A power series defines a function
∞
f (x) = n=0 cn (x − a)n whose domain is the interval of convergence of series.
If the radius of convergence R > 0, then f is continuous, differentiable and
integrable on the interval (a − R, a + R). Moreover f 0 (x) and f (x)dx can
R
be found by term-by-term differentiation and integration. Convergence at an

end point may be lost by differentiation or gained through integration.

∞
X
Let y = cn xn
n=0
X∞
y0 = ncn xn−1
n=0
X∞
y 00 = n(n − 1)cn xn−2 .
n=0
We observe that the first term in y 0 and first two terms in y 00 are zero. Keeping
this in mind we can write
∞
X
y0 = ncn xn−1 .
n=1
X∞
y 00 = n(n − 1)cn xn−2 . (2.100)
n=2
P∞
Identity property: If n=0 cn (x − a)n = 0, R > 0 for all x in the interval
of convergence, then cn = 0 for all n.
Analytic at a point: A function f is analytic at a point a if it can be
represented by a power series in x − a with a positive or infinite radius of
f (n) (a)
convergence. A power series where cn = , that is, the series of the
n!
P f (a)(n)
type cn (x − a)n , is called the Taylor series. If a = 0 the Taylor
n!
series is called a Maclaurin series. In calculus ex , cos x, sin x, ln(x − 1)
can be written in the form of a power series, more precisely in the form of a
Maclaurin series. For example
x2
ex = 1+x+ + ...
2!
x3 x5
sin x = x− +
3! 5!
x2 x4 x6
cos x = 1− + −
2! 4! 6!
for |x| ≤ ∞.
Arithmetic of power series: Two power series can be combined through
operation of addition, multiplication and division. The procedures for power
series are similar to those by which two polynomial are added, multiplied and
divided. For example :
x2 x3 x4 x3 x5 x7

x
e sin x = 1+x+ + + + ... x− + −
2 6 24 6 120 5040

1 1 1 1
= (1)x + x2 + − + x3 + − + x4 + . . .
6 2 6 6

1 1 1
... − + x5 + . . .
120 12 24
x3 x5
= x + x2 + − .
3 30
Since the power series for ex and sin x converges for |x| < ∞, the product
series converges on the same interval.
Shifting summation index: In order to discuss power series solutions of
differential equation it is advisable to combine two or more summations as a
single summation.
P∞ P∞
Example 167. Express n=2 n(n − 1)cn xn−2 + n=0 cn xn+1 as one power
series.
Solution: In order to add the two given series, it is necessary that both
summation indices start with the same number and the powers of x in each
series be such that if one series starts with a multiple of x to the first power,
we want the other series to start with same power. In this problem the first
series starts with the x1 . By writing the first term of the first series outside
the summation notation,
∞
X ∞
X ∞
X ∞
X
n(n−1)cn xn−2 + cn xn+1 = 2.1c−2x0 + n(n−1)cn xn−2 + cn xn+1 .
n=2 n=0 n=3 n=0
Both series on the right hand side start with same power of x, namely x1 . Let
k = n − 2 or k = n + 1 respectively in the first and second series. Then the
right hand side becomes
∞
X ∞
X
2c2 + (k + 2)(k + 1)ck+2 xk + ck−1 xk . (2.101)
k=1 k=1
Keep in mind that the value of the summation index is important, not the
summation index which is a dummy variable, say k = n − 1 or k = n + 1. Now
we are in position to add series in () term by term and we have
∞
X ∞
X ∞
X
n−2 n+1
n(n − 1)cn x + cn x = 2c2 + [(k + 2)(k + 1)ck+2 x + ck1 ]xk .
n=2 n=0 k=1
2.8.2 Solution about Ordinary Point

We look for a power series solution of a linear second order differential
equation about a special point:
d2 y dy
a2 (x) 2
+ a1 (x) + a0 (x)y = 0 (2.102)
dx dx
where a2 (x) 6= 0.
This can be put into standard form
d2 y a1 (x) dy a0 (x)
+ + y=0
dx2 a2 (x) dx a2 (x)
d2 y dy
or 2
+ P (x) + Q(x)y = 0. (2.103)
dx dx
Note that x0 is said to be an ordinary point of the differential Equation
(2.102) if P (x) and Q(x) represented by a power series. A point that is not
ordinary is called a singular point.
P∞
A solution of the form y = n=0 cn (x − x0 )n is said to be a solution
about the ordinary point x0 .
Remark 35. It has been proved that if x = x0 is an ordinary point of (2.102)
then there exist two linearly independent
P∞ solutions in the form of a power series
centered at x0 , that is, y = n=0 cn (x − x0 )n . A series solution converges at
least on some interval defined by |x − X0 | < R, where R is the distance from
x0 to the closest singular point.
Power series solution about ordinary point:
P∞ dy d2 y
Let y = n=0 cn xn and substitute values of y, = y0 , = y 00 in (2.103).
dx dx2
Combine series as in Example 167, and then equate all coefficients to the right
hand side of the equation to determine the coefficients cn . We illustrate the
method by the following examples. P∞We also see through these examples how
the single assumption that y = n=0 cn xn leads to two sets of coefficients,
so we have two distinct power series y1 (x) and y2 (x) both expanded about
the ordinary point x = 0. The general solution of the differential equation is
y = C1 y1 (x) + C2 y2 (x), and it can be shown that C1 = c0 and C2 = c1 .
d2 y
The differential equation + xy = 0 is known as Airy’s equation and is
dx2
used in the studies of the diffraction of light, diffraction of radio waves around
the surface of the earth, aerodynamics and other physical phenomena. We
discuss here power series solution of this equation around its ordinary point
x = 0.
Example 168. Write the general solution of Airy’s equation y 0 + xy = 0.
Solution: In view of the remark, two power series solutions centered at 0

P∞ n 00
and |x| < ∞ exist. By substituting y =
P∞ convergent forn−2 n=0 cn x , y =
n=2 n(n − 1)cn x into Airy’s differential equation we get
∞
X ∞
X
00 n−2
y + xy = n(n − 1)cn x +x cn x n ,
n=2 n=0
X∞ ∞
X
= n(n − 1)cn xn−2 + cn xn+1 . (2.104)
n=2 n=0
As seen in the solution of Example 167, (2.104) can be written as

∞
X
00
y + xy = 2c2 + [(k + 1(k + 2)ck+2 + ck−1 ]xk = 0. (2.105)
k=1
Since (2.105) is identically zero, it is necessary that coefficients of each power

of x be set equal to zero, that is,
2c2 = 0 (coefficients of x0 ) and
(k + 1) = (k + 2)ck+2 + ck−1 = 0, k = 1, 2, 3, . . . . (2.106)
The above holds in view of the identity property. It is clear that c2 = 0. The
expression in (2.106) is called a recurrence relation and it determines the ck
in such a manner that we can choose a certain subset of the set of coefficients
to be non-zero. Since (k + 1)(k + 2) 6= 0 for all values of k, we can solve (2.106)
for ck+2 in terms of ck−1 :
−ck−1
ck+2 = , k = 1, 2, 3, . . . (2.107)
(k + 1)(k + 2)
c0
For k = 1, c3 = − .
2.3
c1
For k = 2, c4 = − .
3.4
c2
For k = 3, c5 = − = 0 as c2 = 0.
4.5
c3 1
For k = 4, c6 = − = c0 .
5.6 2.3.5.6
c4 1
For k = 5, c7 = − =− c1 .
6.7 3.4.6.7
c5
For k = 6, c8 = − = 0 as c5 = 0.
7.8
c6 1
For k = 7, c9 = − =− c0 .
8.9 2.3.5.6.8.9
c7 1
For k = 8, c10 = − =− c1 .
9.10 3.4.6.7.8.10
c8
For k = 9, c11 = − = 0 as c8 = 0 and so on.
10.11
Substituting the coefficients just obtained into

∞
X
yn = = c0 +c1 x+c2 x2 +c3 x3 +c4 x4 +c5 x5 +c6 x6 +c7 x7 +c8 x8 +c9 x9 +c10 x10 . . .
n=1
we get
c0 3 c1 4 c0
y = c0 + c1 x + 0 − x − x +0+ x6
2.3 3.4 2.3.5.6
c1 c0 c1
+ x7 + 0 − − x9 − x10 + 0 + . . .
3.4.6.7 2.3.5.6.8.9 3.4.6.7.8.10
After grouping the terms containing c0 and the terms containing c1 , we obtain
y = c0 y1 (x) + c1 y2 (x),
1 3 1 1
where y1 (x) = 1− x + x6 − x9 + . . .
2.3 2.3.5.6 2.3.5.6.8.9
∞
X (−1)k
= 1+ x3k
2.3 . . . (3k − 1)(3k)
k=1
1 4 1 1
y2 (x) = x− x + x7 − x1 0 + . . .
3.4 3.4.6.7 3.4.6.7.9.10
∞
X (−1)k
= x+ x3k+1 .
3.4 . . . (3k)(3k + 1)
k=1
Since the recursive use of (2.107) leaves c0 and c1 completely undetermined,

they can be chosen arbitrarily. y = c0 y1 (x) + c1 y2 (x) is the general solution
of Airy’s equation.
Example 169. Find two power series solutions of the differential equation
y 00 − xy = 0 about the ordinary point x = 0.
P∞
Solution: Substituting y = n=0 cn xn into the differential equation we get
∞
X ∞
X
y 00 − xy = n(n − 1)cn xn−2 − cn xn+1
n=2 n=0
X∞ ∞
X
k
= (k + 2)(k + 1)ck+2 x − ck−1 xk
k=0 k=1
∞
X
= 2c2 + [(k + 2)(k + 1)ck+2 − ck−1 ]xk .
k=1
Thus c2 = 0,
(k + 2)(k + 1)ck+2 − ck−1 = 0
and
1
ck+2 = ck−1 , k = 1, 2, 3, . . .
(k + 2)(k + 1)
Choosing c0 = 1 and c1 = 0 we find
1 1
c3 = , c4 = c5 = 0, c6 =
6 180
and so on. For c0 = 1 and c1 = 0 we obtain
1 1
c3 = 0, c4 = , c5 = c6 = 0, c7 =
12 504
and so on. Thus two solutions are
1 1 6
y1 = 1 + x3 + x + . . . and
6 180
1 4 1 7
y2 = x + x + x + ...
12 504
2.8.3 Solution about Regular Singular Points: The Method

of Frobenius
A singular point x0 of (2.102) is called a regular singular point of this
equation if the function p(x) = (x − x0 )P (x) and q(x) = (x − x0 )2 Q(x) are
both analytic at x0 . A singular point that is not regular is said to be an
irregular singular point of the equation. This means that one or both of the
functions p(x) = (x − x0 )P (x) and q(x) = (x − x0 )2 Q(x) fails to be analytic
at x0 .
In order to solve a differential equation given by (2.102) about a regular
singular point we employ the following theorem due to Frobenius.
Theorem 32. (Frobenius theorem) If x = x0 is a regular singular point of
the differential Equation (2.102), then there exists at least one solution of the
form
∞
X ∞
X
y = (x − x0 )r cn (x − x0 )n = cn (x − x0 )n+r
n=0 n=0
where r is constant to be determined. The series will converge at least on some

interval 0 < x − x0 < R.
Method of Frobenius: Fining a series solution about a regular singular point
x0 , isPsimilar to the method of previous subsection in which we substitute
∞
y = n=0 cn (x − x0 )n+r into the given differential equation and determine
the unknown coefficients cn by a recurrence relation. However we have an
additional task in this procedure. Before determining coefficients we must
find unknown exponent r and equate to 0 the coefficients of the lowest power
of x. This is called an indicial equation and determines the value(s) of the

index r.
If r is found to be number
P∞ that is not a non-negative integer, then the
corresponding solution y = n=0 cn (x − x0 )n+r is not a power series. For the
sake of simplicity we assume that the regular singular point is x = 0.
Example 170. Apply the method of frobenius to solve the differential equa-
tion 2xy 00 + 3y 0 − y = 9 about the regular singular point x = 0.
Solution: Let us assume that the solution is of form
∞
X
y = cn xn+r , then
n=0
X∞
y0 = cn (n + r)xn+r−1 and
n=0
X∞
y 00 = cn (n + r)(n + r − 1)xn+r−2 .
n=0
Substituting these values of y, y 0 , y 00 into 2xy 00 + 3y 0 − y = 0, we get

∞
X ∞
X ∞
X
2 cn (n + r)(n + r − 1)xn+r−2 + 3 cn (n + r)xn+r−1 − cn xn+r = 0.
n=0 n=0 n=0
Shifting the index in the third series and combining the first two yields
∞
X ∞
X
cn (n + r)(2n + 2r + 1)xn+r−1 − cn−1 xn+r−1 = 0.
n=0 n=0
Writing the term corresponding to n = 0 and combining the terms for n ≥ l

into one series,
∞
X
c0 r(2r + 1)xr−1 [cn (n + r)(2n + 2r + 1) − cn−1 ]xn+r−1 = 0.
n=1
Equating the coefficients of xr−1 to zero yields the indicial equation
c0 r(2r + 1) = 0.
1
Since c0 6= 0, either r = 0 or r = − .
2
Hence two linearly independent solutions of the given differential equation
have the forms
∞
X
y1 = F0 (x) = cn xn and
n=0
∞
X
−1/2
y2 = F−1/2 (x) = x c∗n xn .
n=0
Since cn (n + r)(2n + 2r + 1) − cn−1 = 0 for all b n ≥ 1, we have the following

information on the coefficients for the two series:
1
(i) c0 is arbitrary, and for n ≥ 1, cn = cn−1 .
n(2n + 1)
1
(ii) c∗0 is arbitrary, and for n ≥ 1, c∗n = c∗ .
n(2n − 1) n−1
Iteration of the formula for c0 , yields
1 2 2c0
n = 1, c1 = c0 = c0 = .
1.3 1.2.3 3!
1 1 22 c0
n = 2, c2 = c1 = c0 = .
2.5 2.3.5 5!
1 1 22 c0 2 3 c0
n = 3, c3 = c2 = = .
3.7 3.7 5! 7!
2
Each term of cn was multiplied by to make the denominator (2n + 1)!. The
2
general form of cn is then
2n c0
cn = .
(2n + 1)!
2n c0
Similarly, the general form of c∗n is found to be c∗n = . The two solution
(2n)!
are
∞ ∞
X 2n X 2n n
y1 = c0 xn and y2 = c∗0 x−1/2 x .
n=0
(2n + 1)! n=0
(2n)!
y2 is not a power series.
Example 171. Apply the method of Frobenius to obtain two linearly inde-
pendent series solutions of the differential equation
2xy 00 − y 0 + 2y = 0
about the regular singular point x = 0 of the differential equation.
P∞
Solution: Substituting y = n=0 cn xn+r ,
∞
X
y0 = cn (n + r)xn+r−1 and
n=0
∞
X
y 00 = cn (n + r)(n + r − 1)xn+r−2
n=0
into the differential equation and collecting terms, we obtain
2xy 00 − y 0 + 2y = (2r2 − 3r)c0 xr−1
X∞
+ [2(k + r − 1)(k + r)ck − (k + r)ck + 2ck−1 ]xk+r−1 = 0,
k=1
which implies that

2r2 − 3r = r(2r − 3) = 0
and
(k + r)(2k + 2r − 3)ck + 2ck−1 = 0.
3
The indicial roots are r = 0 and r = . For r = 0 the recurrence relation is
2
2ck−1
ck = − , k = 1, 2, 3, . . .
k(2k − 3)
and
4
c1 = 2c0 , c2 = −2c0 , c3 = c0 .
9
3
For r = the recurrence relation is
2
2ck−1
ck = − , k = 1, 2, 3, . . .
k(2k + 3)
and
2 2 4
c1 = − c0 , c2 = c0 , c3 = − c0 .
5 35 945
4
y = C1 (1 + 2x − 2x2 + x3 + . . . )
9
3/2 2 2 2 4 3
+ C2 x (1 − + x − x + . . . ).
5 35 945
2.8.4 Bessel’s Equation
x2 y 00 + xy 0 + (x2 − v 2 )y = 0 (2.108)
is called Bessel’s equation.
Solution of Bessel’s equation:
Because x = 0 is a regular singular point of Bessel’s
P∞equation we know that
there exists at least one solution of the form y = n=0 cn xn+r . Substituting
the last expression into (2.108) gives
∞
X
x2 y 00 + xy 0 + (x2 − v 2 )y = cn (n + r)(n + r − 1)xn+r
n=0
∞
X ∞
X ∞
X
n+r n+r+2 2
+ cn (n + r)x + cn x −v cn xn+r
n=0 n=0 n=0
∞
X
= c0 (r2 − r + r − v 2 )xr + xr cn xn+2
n=0
∞
X
+xr cn [(n + r)(n + r − 1) + (n + r) − v 2 ]xn
n=1
∞
X
= c0 (r2 − v 2 )xr + xr cn [(n + r)2 − v 2 ]xn
n=1
∞
X
+xr cn xn+2 . (2.109)
n=0
From (2.109) we see that the indicial equation is r2 − v 2 = 0, so the indicial

roots are r1 = v and r2 = −v. When r1 = v, (2.109) becomes
∞
X ∞
X
= xv cn n(n + 2v)xn + xv cn xn+2
n=1 n=0
∞ ∞
" #
X X
v n n+2
= x (1 + 2v)c1 x + cn n(n + 2v)x + cn x
n=2 n=0
∞ ∞
" #
X X
v k+2
= x (1 + 2v)c1 x + [(k + 2)(k + 2 + 2v)ck+2 + cn x = 0.
k=0 n=0
Therefore by the usual argument we can write (1 + 2v)c1 = 0 and

(k + 2)(k + 2 + 2v)ck+2 + ck = 0
−ck
or ck+2 = , k = 0, 1, 2, 3, . . . (2.110)
(k + 2)(k + 2 + 2v)
The choice c1 = 0 in (2.110) implies c3 = c5 = c7 = · · · = 0, so for k =
0, 2, 4, 6 . . . . We find after letting k + 2 = 2n, n = 1, 2, 3, . . . that
c2n−2
c2n = − 2 . (2.111)
2 n(n + v)
c0
Thus, c2 = − 2
2 .1(1 + v)
c2 c0
c4 = − 2
=−
2 .2(2 +
v) 24 .2.1(1
+ v)(2 + v)
c4 c0
c6 = − 2 =− 6
2 .3(3 + v) 2 .1.2.3(1 + v)(2 + v)(3 + v)
:
:
:
:
(−1)n c0
c2n = − , n = 1, 2, 3, . . . (2.112)
22n n!(1 + v)(2 + v) . . . (n + v)
It is standard practice to choose c0 to have a specific value namely.

1
c0 =
2v Γ(1 + v)
where Γ(1 + v) is the gamma function (see Appendix A). Since this latter
function possesses the convenient property Γ(1 + α) = αΓ(α), we can reduce
the indicated product in the denominator of (2.112) to one term. For example:
Γ(1 + v + 1) = (1 + v)Γ(1 + v).
Γ(1 + v + 2) = (2 + v)Γ(2 + v) = (2 + v)(1 + v)Γ(1 + v).
Hence we can write (2.112) as
(−1)n
c2n =
22n+v n!(1 + v)(2 + v) . . . (n + v)Γ(1 + v)
(−1)n
= 2n+v
, for n = 0, 1, 2, . . .
2 n!Γ(1 + v + n)
Bessel function of first kind
Using
Pcoefficients c2n just obtained and r = v, a series solution of (2.108) is
∞
y = n=0 c2n x2n+v . The solution is usually denoted by Jv (x):
∞
X (−1)n x 2n+v
Jv (x) = . (2.113)
n=0
n!Γ(1 + v + n) 2
If v ≥ 0, the series converges at least on the interval [0, ∞). Also, for the
second exponent r2 = −v we obtain, in exactly the same manner,
∞
X (−1)n x 2n−v
J−v (x) = . (2.114)
n=0
n!Γ(1 − v + n) 2
The functions Jv (x) and J−v (x) are called Bessel functions of the first
kind of order v and −v, respectively. Depending on the values of v, (2.114)
may contain negative powers of x and hence converge on (0, ∞).
2.8.5 Legendre’s Equation
(1 − x2 )y 00 − 2xy 0 + n(n + 1)y = 0. (2.115)

is known as Legendre’s equation.
Solution of Legendre’s equation: Since x =P 0 is an ordinary point of the
∞ n
equation, we substitute the power series y = n=0 cn x , shift summation
indices, and combine series to get
(1 − x2 )y 00 − 2xy 0 + n(n + 1)y = [n(n + 1)c0 + 2c2 ] + [(n − 1)(n + 2)c1 + 6c3 ]x
X∞
+ [(j + 2)(j + 1)cj+2 + (n − j)(n + j + 1)cj ]xj = 0,
j=2
which implies that

n(n + 1)c0 + 2c2 = 0
(n − 1)(n + 2)c1 + 6c3 = 0
(j + 2)(j + 1)cj+2 + (n − j)(n + j + 1)cj = 0.
n(n + 1)
c2 = −
2!
(n − 1)(n + 2)
c3 = −
3!
(n − j)(n + j + 1)
cj+2 = − cj , j = 2, 3, 4, . . . (2.116)
(j + 2)(j + 1)
If we let j take on the values 2,3,4,. . . , the recurrence relation (2.116) yields
(n − 2)(n + 3)
c4 = − c2
4.3
(n − 2)n(n + 1)(n + 3)
= c0
4!
(n − 3)(n + 4)
c5 = − c3
5.4
(n − 3)(n − 1)(n + 2)(n + 4)
= − c1
5!
(n − 4)(n + 5)
c6 = − c4
6.5
(n − 4)(n − 2)n(n + 1)(n + 3)(n + 5)
= − c1
5!
(n − 5)(n + 6)
c7 = − c5
7.6
(n − 5)(n − 3)(n − 1)(n + 2)(n + 4)(n + 6)
= − c1
7!
and so on. Thus for at least |x| < 1 we obtain two linearly independent power
series solutions:

n(n + 1) 2 (n − 2)n(n + 1)(n + 3) 4
y1 (x) = c0 1 − x + x
2 4!

(n − 4)(n − 2)n(n + 1)(n + 3)(n + 5) 6
− x + ... . (2.117)
6!

(n − 1)(n + 2) 3 (n − 3)(n − 1)(n + 2)(n + 4) 5
y2 (x) = c1 x − x + x
3! 5!

(n − 5)(n − 3)(n − 1)(n + 2)(n + 4)(n + 6) 7
− x + ... .
7!
If n is an even integer, the first series terminates, whereas y2 (x) is an infinite

series. For example, if n = 4, then

4.5 2 2.4.5.7 4
y1 (x) = c0 1 − x + x
2! 4!

35
= c0 1 − 10x2 + x4 .
3
Similarly, when n is an odd integer, the series for y2 (x) terminates with xn ,
that is, when n is a non-negative integer, we obtain an nth degree polynomial
solution of Legendre’s equation.
Since we know that a constant multiple of a solution of Legendre’s equa-
tion is also a solution, it is traditional to choose specific values for c0 or c1 ,
depending on whether n is an even or odd positive integer, respectively. For
n = 0 we choose c0 = 1, and for n = 2, 4, 6 . . .
1.3 . . . (n − 1)
c0 = (−1)n/2 ;
2.4 . . . n
whereas for n = 1 we choose c1 = 1, and for n = 3, 5, 7, . . . ,
1.3 . . . n
c1 = (−1)(n−1)/2 .
2.4 . . . (n − 1)
For example, when n = 4, we have

1.3 35
y1 (x) = (−1)4/2 1 − 10x2 + x4
2.4 3
1
= (35x4 − 30x2 + 3).
8
These specific nth degree polynomial solutions are called Legendre poly-
nomials and are denoted by Pn (x). From the series for y1 (x) and y2 (x) and
from the above choices of c0 and c1 we find that the first several Legendre
polynomials are
P0 (x) = 1
P1 (x) = x
1
P2 (x) = (3x2 − 1)
2
1
P3 (x) = (5x3 − 3x)
2
1
P4 (x) = (35x4 − 30x2 + 3)
8
1
P5 (x) = (63x5 − 70x3 + 15x). (2.118)
8
Remember, P0 (x), P1 (x), P2 (x), . . . are, in turn, particular solutions of the

differential equations
n = 0 : (1 − x2 )y 00 − 2xy 0 = 0
n = 1 : (1 − x2 )y 00 − 2xy 0 + 2y = 0
n = 2 : (1 − x2 )y 00 − 2xy 0 + 6y = 0
n = 3 : (1 − x2 )y 00 − 2xy 0 + 12y = 0.
2.9 Exercises
2.1. Classify the given differential equations by order, and tell whether it is
linear or non linear.
(a) y 0 + 2xy = x2
(b) y 0 (y + x) = 6
(c) y cos y = y 00
(d) y 2 sin y = y 00
(e) y 00 − 4y 0 + 3y = x4
(f) y 00 = ezy
2.2. State whether the following differential equations are linear or non-
linear. Write the order of each equation.
(a) (1 − x2 )y 00 − 6xy 0 + 9y = sin x
2
xd3 y dy
(b) 3
−2 +y =0
dx dx
(c) yy 0 + 2y = 2 + x2
d2 y
(d) + 9y = sin y
dx2
2 2 !1/2
dy d y
(e) = 1+
dx dx2
d2 r k
(f) =− 2
dt2 r
Verify that in Exercises 2.3 to 2.8 the indicated function is a solution of the
given differential equation. In some cases assume an appropriate interval.
2.3. 2y 0 + y = 0; y = e−x/2 .
2.4. y 00 + 16y = 0; y = sin 4x.

2.5. x2 dy + 2xydx = 0 : y = −1/x2 .
2.6. y 000 − 3y 00 + 3y 0 − y = 0; y = x2 ex .
2.7. y = y − 1; y = ex + 1.
2.8. y 00 + 9y = 8 sin x; y = sin x + λ1 cos 3x + λ2 sin 3x.
In Exercises 2.9 through 2.14 determine a region of the xy plane for which
the given differential equation would have a unique solution through a point
(x0 , y0 ) in the region.
dy √
2.9. = xy
dx
dy y
2.10. =
dx x
dy
2.11. =y+x
dx
y2
2.12. y 0 =
x2 + y 2
dy
2.13. = x2 cos y
dx
dy y+x
2.14. =
dx y−x
2.15. Solve the following differential equations by the separation of variables
method:
(i) dx + e3x dy
dy
(ii) = (1 + x)3
dx
dy
(iii) = e3x−2y
dx
dP
(iv) = P − P2
dt
dN
(v) + N = N + et+2
dt
dy y2 − 1
(vi) = 2
dx x −1
2.16. Solve the following linear differential equations :
dy
(i) (x2 − 9) + xy = 0
dx
dy
(ii) x + 2y = 3
dx
dy
(iii) cos x + (sin x)y = 1
dx
2.17. Solve the following initial value problems:
dy
(i) y − x = 2y 2 , y(1) = 5
dx
dy
(ii) (x + 1) + y = ln x, y(1) = 10
dx
2.18. Determine whether the given differential equation is exact. If it is exact,
solve it.
(i) (2x + y)dx − (x + 6y)dy = 0

(ii) (5x + 4y)dx + (4x − 8y 3 )dy = 0
dy
(iii) x = 2xex − y + 6x2
dx
2.19. Find the general solutions of the second order differential equations be-
low:
(i) y 00 + 8y 0 + 16y = 0
(ii) y 00 − 4y 0 + 5y = 0
(iii) 2y 00 − 3y 0 + 4y = 0
2.20. Solve the given initial value problems:
(i) y 00 + 16y = 0, y(0) = 2, y 0 (0) = −2
(ii) 4y 00 − 4y 0 − 3y = 0, y(0) = 1, y 0 (0) = 5
2.21. Solve the following initial value problems:
d2 y
(i) 2
+ 4y = −2
dxπ 1
dy
y = , =2
8 2 dx x=π/8
d2 y dy
(ii) 5 +4 = −6x
dx2 dx
dy
y(0) = 0, = −10
dx x=0
2.22. Solve each differential equation below by variation of parameters.
(i) y 00 + y = sin x
1
(ii) y 00 + 3y 0 + 2y =
1 + ex
2.23. Solve the following differential equations with or without initial values.
d2 y dy
(i) x2 − 7x + 41y = 0
dx2 dx
2
d y dy
(ii) x 2 − 3 =0
dx dx
d2 y dy
(iii) x2 2 + 3x =0
dx dx
dy
y(1) = 0, =4
dx x=1
2.24. Derive a population growth model that takes deaths into account.
2.25. A drug is infused into a patient’s blood system at a constant rate of r
grams per second. Simultaneously the drug is removed at a rate propor-
tional to the amount x(t) of the drug present at any time t. Determine
a differential equation governing the amount x(t).
2.26. Find the relation between doubling and tripling times for a population.
2.27. In an archaeological wooden specimen, only 25% of original radiocarbon
12 is present. Write a mathematical model, the solution of which will
give time of its manufacturing.
2.28. Write a mathematical model whose solution will provide the rate of
interest compounded continuously if a bank’s rate of interest is 10% per
annum.
2.29. The number of field mice in a certain pasture is given by the function
200−10t, where t is measured in years. Determine a differential equation
governing a population of owls that feed on the mice if the rate at which
the owl population grows is proportional to the difference between the
number of owls at the time t and the number of field mice at time t.
2.30. Let a dog start running to pursue a rabbit at time t0 when the dog sights
the rabbit. Determine a differential Equation (mathematical model)
whose solution will give the path of pursuit assuming that the rabbit
runs in a straight line at a constant speed so that its line of sight is
always directed at the rabbit.
2.31. To save money the manager of a manufacturing firm decides to eliminate
the advertising budget. In the absence of advertising, the sales manager
finds that sale in Indian rupees, decline at a rate that is directly propor-
tional to the volume of sales. Write a differential equation that describes
the rates of declining sales.
2.32. Suppose you deposited 10,000 Indian rupees in a bank account at an
interest rate of 5% compounded continuously. Write a mathematical
model in terms of a differential equation whose solution will give the
amount of money in your account after a year and a half.
2.33. Bacteria grown in a culture increase at a rate proportional to the num-

ber present. If the number of bacteria doubles every 2 hours, write a
mathematical equation describing this situation by which you can find
the population of bacteria (number) after 10 hours and 10 days.
2.34. The growth rate of a population of bacteria is directly proportional to
the population. If the number of bacteria in a culture grows from 100
to 400 in 24 hours, write the initial value problem to determine the
population after 12 hours.
2.35. A culture initially has P0 number of bacteria. At t = 1 hour, the number
of bacteria is measured to be 3/2P0 . If the rate of growth is proportional
to the number of bacteria P (t) present at time t, determine the time
necessary for the number of bacteria to triple.
2.36. Solve the logistic differential equation:

dN N
= r0 1 − N,
dt k
t ≥ 0, N (0) = N0 .
2.37. Insects in a tank increase at a rate proportional to the number present.

If the number increases from 50,000 to 100,000 in 1 hour, how many
insects are present at the end of 2 hours?
2.38. It was estimated that the earth’s human population in 1961 was 3.060
billion. Assume the population in 1996 using a model of population
growth (2.8), check this number with the actual population of the earth
based on available from authentic sources.
2.39. A breeder reactor converts relatively stable uranium 238 into the iso-
tope plutonium 239. After 30 years 0.022% of the initial amount N0 of
plutonium has disintegrated. Find the half life of this isotope if the rate
of disintegration is proportional to amount remaining.
2.40. The radioactive Pb−209 isotope of lead decays at a rate proportional to
the amount present at time t and has a half life of 4 hours. If 1 gram
of lead is present initially, how long will it take for 80% of the lead to
decay?
2.41. In the 1950 excavation at Nippur in Babylonia, charcoal from a roof
beam gave a count of 4.09 disintegrations per minute per gram. Liv-
ing wood gave 6.68 disintegrations. Assuming that this charcoal was
formed during Hammurabi’s reign, find an estimate for the likely time
Hammurabi’s succession.
2.42. Suppose a large mixing tank initially holds 300 gallons of water in which
50 pounds of salt has been dissolved. Pure water is pumped into the tank
at a rate of 3 gal/min, and when the solution is well stirred it is pumped
out at the same rate. Write a differential equation for the amount A(t)
of salt in the tank at any time t.
2.43. A spherical rain drop evaporates at a rate proportional to its surface
area. Write a differential equation for its volume V as a function of
time.
2.44. A chemical A in a solution breaks down to form chemical B at a rate pro-
portional to the concentration of unconverted A. Half of A is converted
in 20 minutes. Write a differential equation describing this physical sit-
uation.
2.45. A tank with a capacity of 600 liters initially contains 200 liters of pure
water. A solution containing 3 kilograms of salt per liter is allowed to
run into the tank at a rate of 16 liters per minute. The mixture is then
removed at a rate of 12 liters per minute. Find the expression for the
number of kilograms of salt in the tank at any time t.
2.46. A large tank is filled with 600 liters of pure water. Brine containing 2
kilograms of salt per liter is pumped into the tank at a rate of 5 liters
per minute. The well mixed solution is pumped out at the same rate.
Find the number P (t) of kilograms of salt in the tank at time t. What
is the concentration of the solution in the tank at t = 10 minutes?
2.47. A 250 liter tank contains 100 liters of pure water. Brine containing 4
kilograms of salt per liter flows into the tank at a rate of 5 liters per
minute. If the well stirred mixture flows out at a rate of 3 liters per hour,
find the concentration of salt in the tank at the instant it is filled to the
top.
2.48. A thermometer reading 100◦ F is placed in a pan of oil maintained at
10◦ F . What is the temperature of the thermometer when t = 20 seconds,
if its temperature is 60◦ F when t = 8 seconds?
2.49. A thermometer is removed from a room where the air temperature is
60◦ F and is taken outside, where the temperature is 55◦ F . After 1
minute the thermometer reads 50◦ F . What is the reading of the ther-
mometer at t = 2 minutes? How long will it take for the thermometer
to reach 20◦ F .
2.50. Water is heated at 120◦ F . It is then removed from the burner and kept
in a room of 30◦ F temperature. Assuming that there is no change in the
temperature of the room and the temperature of the hot water is 110◦ F
after 3 minutes:
(a) Find the temperature of water after 6 minutes.
(b) Find the duration in which water will cool down to the room tem-
perature.
2.51. The diagram in Figure 2.3 represents an electric circuit in which voltage
of V volts is applied to a resistance of R ohms and an inductance of
L henrys connected in series.When the switch is closed, a current will
vary with time, and it can be shown that a mathematical model for this
circuit is the first order differential equation
dI
L + IR = V.
dt
Verify that the current in the circuit is given by
V
I= (1 − e−Rt/L ).
R
2.52. When an object at room temperature is placed in an oven whose temper-

ature is 400◦ C, the temperature of the object will increase with time,
approaching the temperature of the oven. It is known that the tem-
perature Q of the object is related to the time through the differential
equation
dQ
= α(Q − 400).
dt
Verify that the temperature of the object is given by Q = 400 + ccαt ,
where c and α are constants.
2.53. A car leaves at 11:30 am and arrives at a heart research center in a city
at 3 pm. Its driver started from rest and steadily increased his speed,
as indicated on his speedometer, to the extent that when he reached the
destination he was driving at a speed of 60 kilometer per hour. Write a
mathematical model in terms of differential equation which may help to
determine the distance to location from where the car started.
2.54. Find Laplace transforms of the functions:
(a) 2 sinh t − 4
(b) t2 − 3t + 5
(c) 4t sin 2t
(d) t − cos(5t)
(e) (t + 4)2
(f) 3e−t + sin 6t
(g) t3 − 3t + cos 4t
(h) −3 cos 2t + 5 sin 4t
(i) te4t
(j) t2 e−2t
(k) et cos t

t, 0 ≤ t < 1
(l) f (t) =
1, t ≥ 1
(m) t(t − 2)e3t
(n) t3 − sinh 2t
(o) e−2t + 4et

1, 0 ≤ t < 2
(p) f (t) =
2, t ≥ 2

sin t, 0 ≤ t ≤ π
(q) g(t) =
0, t≥π
t
e, 0≤t<2
(r) f (t) =
0, t ≥ 2

 1, 0 ≤ t < 2
(s) L(t) = 2, 2 ≤ t < 4
0, t ≥ 4

2.55. Show that tn , where n is a positive integer is of exponential order. Show

that the following functions are of exponential order.
(a) t1/2
(b) sinh t
(c) t sin 2t
s2

−1
2.56. Evaluate L .
(s + 1)3

−1 s
2.57. Evaluate L .
s2 + 6s + 13
2.58. Use the first shifting theorem to find the Laplace transforms of the
following functions.
(i) et (cos 2t − 3 sin 5t).
(ii) e−2t cos 4t.
2.59. Use the first shifting theorem to solve the initial value problems.
y 00 − 6y 0 + 9y = t2 e3t .
y(0) = 2, y 0 (0) = 17.

Rt
2.60. Solve f (t) = t − 0
(t − u)f (u)du.
Rt
2.61. Solve f (t) + 2 0 f (u) cos(t − u)du = 4et + sin t.
Rt
2.62. Solve f (t) + 0 f (u)du = 1.
8 Rt
2.63. Solve f (t) = 1 + t − (u − t)3 f (u)du.
3 0
2.64. Solve y 00 + y = 4δ(t − 2π), y(0) = 1, y 0 (0) = 0.
2.65. Solve y100 +10y10 −4y2 = 0, −4y1 +y200 +4y2 = 0 subject to initial conditions
y1 (0) = 0, y10 (0) = 1, y2 (0) = 0, y20 (0) = −1.
2.66. Solve the initial value problem y 0 = 3y = 13 sin 2t, y(0) = 6.

2.67. Solve the initial value problem y 000 + 2y 00 − y 0 − 2y = sin 3t, y(0) =
0, y 0 (0) = 0, y 0 (0) = 0, y 00 (0) = 1.
2.68. Solve the initial value problems:
(a) y 00 + 4y = sin tH(t − 2π), y(0) = 1, y 0 (0) = 0.

(b) y 00 − 5y 0 + 6y = H(t − 1), y(0) = 0, y 0 (0) = 1.
2.69. Let F (s) = L{f (t)} and n = 1, 2, 3, . . . , then prove that
dn
L{tn f (t)} = (−1)n F (s).
dsn
Power series
2.70. Write ex cos x in the form of a power series. Examine whether this power
series is convergent.
Solution about ordinary point
Find the general solution of the following differential equations about an or-
dinary point in terms of two power series.
2.71. y 00 − (1 + x)y = 0.
2.72. y 00 + x2 y 0 .
2.73. y 00 + y = ex .
2.74. (x2 − 1)y 0 + y = 0.

2.75. y 00 − (x + 1)y 0 − y = 0.
Use the power series method to solve the following initial value problems
2.76. (x − 1)y 00 − xy 0 + y = 0, y(0) = −2, y 0 (0) = 6.
2.77. (x2 + 1)y 00 + 2xy 0 = 0, y(0) = 0, y 0 (0) = 1.

Solution about regular singular point: method of Frobenius

Use the method of Frobenius to solve the following differential equations.
2.78. xy 00 − xy 0 + y = 0.
3 0
2.79. y 00 + y − 2y = 0.
x
2.80. xy 00 + y 0 + y = 0.
2.81. xy 00 + y 0 + xy = 0.
Bessel’s equation
Find the general solution of the following equations.
2.82. x2 y 00 + xy 0 + (x2 − 1)y = 0.
2.83. xy 00 + xy 0 + xy = 0.
2.84. Verify that y = xn Jn(x) is a particular solution of xy 00 +(1−2n)y 0 +xy =
0, x > 0.
Legendre’s equation
Solve the following equations.
2.85. (1 − x2 )y 00 − 2xy 0 = 0.
2.86. (1−x2 )y 00 −2xy 0 +12y = 0 subject to initial conditions y(0) = 0, y 0 (0) =
1.

In this chapter we have presented some well known methods of finding an-
alytic solutions of linear differential equations of first and second orders with
and without initial and boundary conditions. Solutions of second order differ-
ential equations applying Laplace transform as well as series solutions methods
with variable coefficients are discussed. Modeling of engineering problems in
terms of ordinary differential equations is explained. A natural question arises
as to what related areas to this chapter could be studied independently after
completing this chapter.
Section 2.6 models under simple assumptions are considered. One may take
up the study of problems more realistic in nature, for example, problems from
references [3 through 5, 7, 11 through 15]. Ordinary differential equations are
also used in modeling of epidemics. One may pursue the study of this field
from references [2, 12].
In Section 2.6.3 we have discussed Newton’s cooling law and its formulation
as a differential equation. Newton’s cooling law is purely a surface principle; it

involves a boundary condition, and it leads to a surface temperature that de-
pends only on time. In extended bodies, interior temperature typically varies
not only in time but also from place to place. Principles that govern interior
temperature were first explained by Joseph Fourier (1768-1830) at the be-
ginning of the 19th century. His theory is explained in Chapter 4. Fourier’s
method hinges in the relationship between heat and temperature and on the
principle of conservation of energy. The equation governing distribution of
temperature is known as the heat equation. It is discussed in Chapter 5. How-
ever, we recommend books by Cannon [6] and Groestch [8, 118-126, 9] for
lucid introduction. It may be pointed out here that the heat equation has
been used to model option pricing− important concept of financial engineer-
ing which fetched a Nobel Prize for economics in 1997. Roos [12] provides
updated information on applications of ordinary differential equations to pop-
ulation dynamics which is quite useful for planning in different branches of
engineering. MATLAB programs are introduced. A reading of the book may
inculcate interest among engineering students and even faculty members for
applying differential equations in engineering problems.
As mentioned above, ordinary differential equations are used for modeling
epidemics; see for example a recent article by Beckely et al. [2]. References [3, 5,
7, 13, 14] provide valuable information of application of ordinary differential
equations in diverse fields. We solve direct problems related to differential
equations, that is, we have found solutions of differential equations in diverse
fields and under certain conditions.
An inverse problem could be to find boundary or initial conditions or
parameters if the solution of a given differential equation is known. A simple
account of inverse problems associated with ordinary differential equations
is given by Groetsch [8, 9]. An introduction of inverse problems and their
applications to engineering problems is given in Chapter 9. The books by
Krisch [10], Bank and Kunish [1] and Groetsch [9] provide slightly advanced
level discussions on inverse problems in engineering fields.
It may be pointed out to the readers that solutions of inverse problems
have been important in the development of mathematics and different areas
of sciences and technology. That fact alone would justify the introduction of
inverse problems in the early years of undergraduate programs. But more im-
portant is the development of a habit of inverse thinking. If students consider
only the direct problem, they are not looking at the problem from all angles
and will fail to see two thirds of real world problems. The authors have been
benefited from references [3, 4, 11, 13 through 15] while writing this chapter.
Bibliography
[1] H. T. Bank and K. Kunish, Estimation Techniques for Distributed Pa-

rameter Systems, Birkhäuser, Boston, 1989.
[2] R. Beckley et al., Modeling Epidemics with Differential Equations,
www.tnstate.edu/math.
[3] P. Blanchard, L. D. Robert and R. H. Glem, Ordinary Differential Equa-
tions, Richard Stratten, 2012.
[4] W. E. Boyce, R. C. DiPrima, Elementary Differential Equations and
Boundary Value Problems, Ninth Edition, Wiley, 2010.
[5] J. R. Brenan, W. E. Boyce, Differential Equations: An Introduction to
Modern Methods and Applications, John Wiley & Sons Inc, 2007.
[6] J. R. Cannon, The One–Dimensional Heat Equation, Addison Wesley,
1998.
[7] C. H. Edwards and D. E. Penney, Differential Equations: Computing and
Modeling, Fourth Edition, Pearson Prentice Hall, 2008.
[8] C. W. Groestch, Inverse Problems, Mathematical Association of America,
1999.
[9] C. W. Groestch, Inverse Problems in Mathematical Sciences, Verlag
Vieweg, Braunschweig, 1993.
[10] A. Krisch, An Introduction to the Mathematical Theory of Inverse Prob-
lems, Springer, New York, 1996.
[11] B. J. Rice and J. D. Strange, Ordinary Differential Equations with Ap-
plications, Third Edition, Brooks/Cole Publishing Company, 1994.
[12] A. M. De Roos, Modeling Population Dynamics, Amsterdam.
(staff.science.uva-nl/aroos/doenloads/pdf sreader/syllabus.pdf ), 2013.
[13] A. H. Siddiqi and P. Manchanda, A First Course of Differential Equations
with Applications, Macmillan, India Ltd., 2006.
[14] W. Xie, Differential Equations for Engineers, Cambridge University
Press, 2010.
249
250 Bibliography
[15] D. G. Zill, A First Course in Differential Equations with Modeling Ap-

plications, Seventh Edition, Brooks/Cole, 2001.
Chapter 3
Vector Calculus
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

3.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
3.2.1 Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
3.2.2 Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
3.3 Differential Calculus of Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
3.3.1 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
3.3.2 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
3.4 Integration in Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
3.4.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
3.4.2 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
3.5 Fundamental Theorems of Vector Calculus . . . . . . . . . . . . . . . . . . . . . . 293
3.5.1 Theorem of Green and Ostrogradski . . . . . . . . . . . . . . . . . . . . 294
3.5.2 Divergence Theorem of Gauss . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
3.5.3 Theorem of Stokes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
3.6 Applications of Vector Calculus to Engineering Problems . . . . . . 305
3.6.1 Elements of Vector Calculus and Physical World . . . . . . . 305
3.6.2 Applications of Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
3.6.3 Applications of Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . 317
3.6.4 Applications of Gauss Divergence Theorem . . . . . . . . . . . . . 321
3.6.5 Application of Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 324
3.6.6 Example of Planar Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 325
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
3.1 Introduction
In the calculus course (Appendix B) we study properties of functions de-
fined on R (the line), R2 (the plane) or R3 (the space) with values in R, which
are called real valued or scalar functions, or scalar fields. Here, we study the
calculus of functions taking values in R2 or R3 , instead of R. Those functions
are called vector valued functions or vector fields. In Section 3.2 the concept
of a vector will be introduced along with its basic algebraic properties. Vec-
tor fields and their continuity and differentiation properties are discussed in
Section 3.3 along with the notions of gradient, divergence and curl. Moreover,
251
we explain how curves and surfaces are described by such functions. Integrals
of vector fields are introduced in Section 3.4, first the line integral for scalar
and vector fields, and then the surface integrals for scalar fields.
In Section 3.5, we present three fundamental theorems of vector calculus,
namely the Green-Ostrogradski theorem, the Gauss divergence theorem and
the theorem of Stokes. Section 3.6 is devoted to certain applications of vec-
tor calculus to science and engineering, with an emphasis on problems from
mechanics.
The concept of a vector can be traced to the development of affine and
analytic geometry in the 17th century and, later, the invention of complex
numbers and quaternions. Vector calculus was developed mainly during the
second half of the 19th century; contributions stem from, among others, Sir
William Rowan Hamilton (1805-1865), William Kingdon Clifford (1845-1879),
James Clerk Maxewell (1831-1879), Hermann Gunter Grassman (1809-1877)
and Josiah Willard Gibbs (1839-1903). However, the fundamental results men-
tioned above were already discovered by George Green (1793-1841), Carl
Friedrich Gauss (1777-1855), Gabriel Stokes (1819-1903) and Mikhail Ostro-
gradski (1801-1862).
Vector calculus now serves as a basic mathematical tool in all areas of
science and engineering where mechanical, electromagnetic and thermody-
namics forces determine the behaviours of solids, fluids, electric conductors,
semiconductors and magnetic materials. Many ideas of this chapter are based
on references [7, 8]: 7 is out of print.
3.2 Vectors
Vectors and their basic properties have been introduced in Section 3.1.
In this section we discuss the concepts of magnitude of a vector and angle
between two vectors. The magnitude of a vector is called the norm. The
angle between two vectors is defined with the help of the inner product of two
vectors.
Definition 53. The inner product of two vectors x = (x1 , x2 , x3 ) and
y = (y1 , y2 , y3 ) denoted by x.y or (x, y) is defined by x.y = x1 y1 + x2 y2 + x3 y3 .
Position vector (2-space and 3-space)
The vector shown in Figure 3.1b with initial point the origin O and terminal
P (x1 , x2 ) is called the position vector of the point P and is written
−−→
OP = (x1 , x2 ).
Figure 3.2 depicts the position velocity in 3-space with the origin O and
terminal point P (x1 , x2 , x3 ) of the point and is written (x1 , x2 , x3 ). Addition
Vector Calculus 253
X
•
FIGURE 3.1a: Points on Line
----- -- __, (x, y)
FIGURE 3.1b: Points in Plane
and scalar multiplication have the property that
x + y = x + y, (x + y) + z = x + (y + z)
α(x + y) = αx + αy, (α + β)x = αx + βx, (αβ)x = α(βx),

hold for all vectors x,y and z and all scalars α and β. This follows from the
corresponding properties of ordinary addition and multiplication. We know
from elementary geometry that the length (or magnitude) of the vector x =
(x1 , x2 , x3 ) is given by the expression
q
x21 + x22 + x23 .
Definition 54. The norm has the following properties, which are analogous
to the properties of the absolute value of a real number.
k x k≥ 0, and k x k= 0 if and only if x = 0, (3.1)
k αx k= |α| k x k for scalars α, (3.2)

k x + y k≤k x k + k y k . (3.3)
FIGURE 3.2: Points in R3
Properties (3.1) and (3.2) can be checked immediately from the definition.
It will be seen below that property (3.3) is a consequence of the Schwarz
inequality. Setting α = −1 in property (3.2), we obtain
k −x k=k x k (3.4)
that holds for every vector x.

The k x + y k≤k x k + k y k is called the triangle inequality. Consider
the triangle formed by the points A = 0, B = x and C = x + y. The triangle
inequality states that the length of the side connecting A and C cannot be
larger than the sum of the length of the other two sides.
Example 172. A vector x with norm k x k= 1 is called a unit vector. If x
is a non-zero vector, then
x
e=
kxk
is a unit vector. The unit vectors
i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1)
(they obviously satisfy k i k=k j k=k k k= 1) are also called unit coordinate
vectors or standard unit vectors, see Figure 3.3. Every vector can be
expressed as a linear combination of the unit coordinate vectors. For x =
(x1 , x2 , x3 ), we have
x = x1 i + x2 j + x3 k.
Vector Calculus 255
z axis
y axis
j
x axis
FIGURE 3.3: Unit Vectors in R3
Indeed, from the rules of addition and scalar multiplication we see that
x = (x1 , 0, 0) + (0, x2 , 0) + (0, 0, x3 )

= x1 (1, 0, 0) + x2 (0, 1, 0) + x3 (0, 0, 1)
= x1 i + x2 j + x3 k.
Example 173. Let us consider a vector x = x1 i+x2 j+x3 k. If x3 = 0, we have

x = x1 i + x2 j, so x lies in the plane spanned by the first two unit coordinate
vectors. We may identify this plane with the standard xy plane spanned by
unit coordinate vectors (1,0) and (0,1), which we continue to denote by i
and j. Every vector x = (x1 , x2 ) in the plane can be expressed as a linear
combination
x = x1 (1, 0) + x2 (0, 1) = x1 i + x2 j
of the unit coordinate vectors.
Example 174. (a) Find the magnitude of the vector 5i + 3k.
−2 3 6
(b) Show that x = ( , , ) is a unit vector.
7 7 7
√ √
Solution: (a) Magnitude = k 5i + 3k k = 25 + 9 = 34.
r r
−2 2 3 2 5 2 4 9 36
(b) k x k= + + = ( + + ) = 1.
7 7 7 49 49 49
3.2.1 Scalar Product

Definition 55. The scalar product and the norm are related by
√
x.x =k x k2 = x21 + x22 + x23 , k x k= x.x, (3.5)
which holds for every vector x = (x1 , x2 ), x3 . The scalar product is commu-
tative, that is,
x.y = y.x
holds for all vectors x and y. Moreover, it is distributive in the sense that
x.(y + z) = x.y + x.z (3.6)
(x + y).z = x.z + y.z (3.7)
(αx).y = α(x.y) = x.(αy) (3.8)
hold for all vectors x, y and z and all scalars α. In particular, setting α = 0
we have for every vector x
x.0 = 0 = 0.x.
All these rules follow immediately from the definition of the scalar product.
Geometric interpretation of the scalar product.

We look at the triangle in Figure 3.4 with vertices A, B and C which has a
right angle at B. Setting x = B − A, y = C − B, the theorem of Pythagoras
tells us that
k x + y k2 =k x k2 + k y k2 . (3.9)
Using (3.5) and the properties of the scalar product we compute
x +y
FIGURE 3.4: Geometric Interpretation of Pythagoras Theorem
k x + y k2 = (x + y).(x + y) = x.x + x.y + y.x + y.y =k x k2 +2x.y+ k y k2 .

Comparing with (3.9) we see that we must have x.y = 0.
Vector Calculus 257
Definition 56. Two vectors x and y are called orthogonal or perpendic-

ular if x.y = 0, and we write x ⊥ y in this case.
Example 175. Examine whether the vectors x = (2, 1, 1) and y = (1, 1, −3)
are orthogonal.
Solution: We have x.y = 2.1 + 1.1 + 1.(−3) = 0. This implies x ⊥ y.
Let us now look at the triangle with vertices A, B and C in Figure 3.4 setting
x = B − A and y = C − B. we want to determine the point D on the side BC
such that the line AD becomes perpendicular to BC. We have D − B = αy
for some scalar α, and A − D = x − αy. In order to determine α, we exploit
the orthogonality relation
0 = αy.(x − αy) = αy.x − α2 y.y

and solving for α we obtain
x.y
α= . (3.10)
y.y
Definition 57. Let x and y be vectors with y 6= 0. The projection of x on y,
denoted by Py (x), is defined by
x.y
Py (x) = y. (3.11)
y.y
The length of the projection is given by
|x.y|
k Py (x) k= , (3.12)
kyk
since
x.y x.y |x.y| |x.y|
k Py (x) k=k y k= | | k y k= k y k= .
y.y y.y k y k2 kyk
In the special case y is a unit vector, k y k= 1 and (3.11) becomes
Py (x) = (x.y)y. (3.13)
In this manner, setting
xI = Py (x), xII = x − Py (x), (3.14)
we can decompose an arbitrary vector x into a sum x = xI +xII of a vector xI

parallel to y and a vector xII perpendicular to y, provided y is nonzero. Indeed
(3.14) is the only way to get a decomposition with those properties, so the
decomposition is unique. This appears natural when we recall the geometric
construction above; we will not write a formal proof here. One may check
directly, however, that the vectors xI and xII are orthogonal. We have
x.y x.y
xI .xII = Py (x).(x − Py (x)) = y.(x − y)
y.y y.y
A A
A
FIGURE 3.5: Projection Py (x) = αy
x.y (x.y)2
= (y.x) − (y.y) = 0.
y.y (y.y)2
We now relate the projection vector to the angle θ in Figure3.5. There θ is an
acute angle (less than 90 degrees), and we have by (3.12)
k Py (x) k |x.y|
cos θ = = .
kxk k y kk x k
If the angle θ is larger than 90 degrees, then the cosine and the scalar product
x.y become negative. (The reader is urged to draw a picture similar to Figure
3.5 for this situation.) The general formula relating the angle θ between x
and y to the scalar product is
x.y
cos θ = . (3.15)
k x kk y k
Example 176. Find the angle between the vectors x = (2,3,2) and y =
(1,2,-1).
Solution: We determine the angle from formula (3.15) above. We have x.y =
2.1 + 3.2 + 2.(−1) = 6 and
√ √ p √
k x k= 4 + 9 + 14 = 17, k y k= 1 + 22 + (−1)2 = 6.
Thus
x.y 6 1√ 10.1
cos θ = =√ √ = 102 ≈ ≈ 0.594,
k x kk y k 17 6 17 17
and θ ≈ 0.935 radians, which is about 54 degrees. Since the cosine ranges
between −1 and 1, we immediately get from (3.15) the Schwarz inequality
(or Cauchy-Schwarz or Cauchy-Bunyakovski-Schwarz inequality)
|x.y| ≤k x kk y k, (3.16)
which is valid for any two vectors x and y. (Let us remark that one can
prove this inequality directly from the properties of the scalar product, with-
out recourse to the geometric construction used above.) Using the Schwarz
Vector Calculus 259
inequality, we may now prove the triangle inequality. Indeed, for two vectors
x and y we have
k x + y k2 = (x + y).(x + y) = x.x + y.x + x.y + y.y

≤k x k2 +2|x.y|+ k y k2
≤k x k2 +2 k x kk x k + k y k2 by Schwarz’s inequality
= (k x k + k y k)2 .
Taking the square roots of both sides we get
k x + y k≤k x k + k y k .
Example 177. Verify triangle inequality for the vectors

(i) (3, −1, 2) and (6, 4, 8)
(ii) (0, 0, 1) and (1, 1, 0).
√ √
Solution: (i) x + y = (9, 3, 10), k x k= 92 + 32 + 102 = 200
√ √ √
k x k= 9 + 1 + 4 = 14, k y k= 36 + 16 + 64,
√ √ √
clearly 200 < 14 + 16. √
(ii) x + y = (1, 1, 1), k x + y k= 3, k x k= 1, k y k= 1
√ √
k x + y k= 3 < 1 + 1 or 3 < 2.
3.2.2 Cross Product

Definition 58. Let x = x1 i + x2 j + x3 k and y = y1 i + y2 j + y3 k. The cross
product (or vector product) of x and y is denoted by xy and defined as
x × y = (x2 y3 − x3 y2 )i + (x3 y1 − x1 y3 )j + (x1 y2 − x2 y1 )k. (3.17)
Thus, the cross product of two vectors yields a vector.

The formula (3.17) for x × y is related to determinants as follows.

i j k
x x3 x1 x3
j + x1 x2 k.
x × y = x1 x2 x3 = 2

i −
y1 y2 y3 y2 y3 y1 y3 y1 y2
Actually, the expression between the two equality signs is not a true determi-
nant, since the first line consists of vectors, not of numbers, but it is a conve-
nient way to memorize (3.17). The standard rule for computing the three 2×2
determinants on the right hand side gives (3.17). Another way to memorize
(3.17) is to observe the cyclic behaviours of the indices.
Geometric interpretation of cross product

The vector x × y is geometrically related to the vectors x and y as follows.
Whenever the vectors x and y are not parallel, they form the sides of a par-
allelogram, see Figure 3.6. The side AB is formed by the vector x and has
D
c
IIY[[
ll vll sin a
length of x = II x 11
length of y = 11y 11
A 8
ll xll
FIGURE 3.6: Geometric Interpretation of Cross Product
length k x k, AD is formed by y with length k y k . As we know, the area of

a parallelogram with base b and height h is given by bh; here b =k x k and
h =k y k sin θ. It turns out (see the discussion of Lagrange identity below)
that the magnitude of x × y equals the area of that parallelogram, so
k xy k=k x kk y k sin θ. (3.18)
Moreover, x×y is orthogonal to both x and y. Finally, from the two remaining
possibilities the direction of x × y is chosen according to the so-called right-
hand rule: Point the index finger in the direction of x and the middle finger
in the direction of y. The thumb then points in the direction of x × y. Below
and in the exercises we will show that these geometric properties follow from
Definition 3.2.6, if the coordinate system has a right hand orientation (index
finger points to i, middle finger points to j, thumb points to k.)
Algebraic properties of cross product
In contrast to the scalar product, the cross product is anticommutative,
that is,
k y k × k x k= − k x k × k y k (3.19)
holds for all vectors x and y. As a consequence, for all vectors x we have
x × x = 0. (3.20)
On the other hand, the cross product shares some properties of the scalar
product. The distributive laws
Vector Calculus 261
x × (y + z) = x × y + x × z (3.21)
(x + y) × z = x × z + y × z (3.22)
(αx) × y = α(x × y) = x × (×y) (3.23)
hold for all vectors x, y and z and all scalars α. In particular, setting α = 0
we have for every vector x
x × 0 = 0 = 0 × x.
The properties (3.19) through (3.21) can be verified directly from the definition
of the cross product.
Example 178. (a) Calculate x × y where x = (1, −1, 3) and y = (2, 1, −1).
(b) Show that i × j = k, j × k = i and k × i = j.
(c) Compute i × (i × j) and (i × i) × j.
(d) Compute x × y where x = i − j and y = i + k.
Solution:(a) We have

i j k
−1 3 1 3 1 −1
x × y = 1 −1 3 = i − j + k = −2i + 7j + 3k.
2 1 −1 1 −1 2 −1 2 1
(b) We have i = (1, 0, 0) and j = (0, 1, 0). We compute

i j k
0 0 1 0 1 0
i × j = 1 0 0 =
i−
j + k
0 1 0 0 0 0 1
1 0
so we have shown that i×j = k. Similarly we can prove the other two identities.
(c) We obtain
i × (i × j) = i × k = −j, (i × i) × j = 0 × j = 0.
(d) We calculate as before

i j k
−1 0 1 0 1 −1
x × y = 1 −1 0 = i − j + 1 0 = −i + j + k.

1 0 1 0 1
0 1
Alternatively we may use part (b) above with properties (3.19) through (3.21)
to compute
x × y = (i − j) × (i + k) = i × i + i × k − j × i − j × k = −j + k − i.
We note that, as we see from part (c) above, that
(x × y) × z = x × (y × z).
Thus, the cross product is not associative. We come back to formula (3.18),
k x × y k=k x kk y k sin θ. From formula (3.15) we see that
k x k2 k y k2 −(xy)2 =k x k2 k y k2 (1 − cos2 θ) =k x k2 k y k2 sin θ,
so (3.18) is equivalent to the Lagrange identity
k x × y k2 =k x k2 k y k2 −(x.y)2 . (3.24)
Lagrange’s identity in turn is a special case (z = x, w = y) of the more

general formula
(x × y).(z × w) = (x.z)(y.w) − (y.z)(x.w). (3.25)
Therefore, formula (3.18) for the magnitude of the cross product follows from
(3.25).
Scalar triple product.
If x, y and z are vectors, the expression (x × y).z is called the scalar triple
product of x, y and z. Its absolute value |(x × y).z| represents the volume of
the parallelogram with edges x, y and z.
Example 179. Show that the scalar triple product satisfies

x1 x2 x3

(x × y).z = y1 y2 y3 .
z1 z2 z3
Solution:

x2 x3 x1 x3 x1 x2
(x × y).z = y2 y3 i − y1 y3 j + y1 y2 k .(z1 i + z2 j + z3 k))

x2 x3 x1 x3 x1 x2
= z1 − z2
+ z3
.
y2 y3 y1 y3 y1 y2
The latter expression is the expansion of the 3 × 3 determinant

x1 x2 x3

y1 y2 y3 .

z1 z2 z3
Scalar and vector products often appear in physics and engineering. Work is
the scalar product of force and displacement. Torque and angular momentum
are the cross products of force and displacement or force and linear momen-
tum. Maxwell’s equations, which provide the foundation of electromagnetic
theory, involve both scalar and cross products of electrical and magnetic vari-
ables.
Vector Calculus 263
Example 180. Let x = 2i−3j +4k and y = −i+2j +5k. Find (i) (2x).(x−2y)
x.y
(ii) ( y).
y.y
Solution:(i)(2x).(x − 2y) = (4i − 6j + 8k)(2i − 3j + 4k + 2i − 4j − 10k)

= (4i − 6j + 8k)(4i − 7j − 6k)
= 16 + 42 − 48 = 10.
x.y (2i − 3j + 4k)(−i + 2j + 5k)

(ii) ( y) = (−i + 2j + 5k)
y.y (−i + 2j + 5k)(−i + 2j + 5k)
(−2 − 6 + 20)
= (−i + 2j + 5k)
1 + 9 + 25
2
= (−i + 2j + 5k)
5
−2 4
= i + j + 2k.
5 5
Example 181. Examine whether the vectors i − j + k and −4i + 3j + 8 k

are orthogonal?
Solution: (i − j + k).(−4i + 3j + 8k) = −4 − 3 + 8 = 1 6= 0 so given vectors
are not orthogonal.
Example 182. Find the value of α such that vectors αi + 12 j + αk and
−3i + 4j + α are orthogonal.
Solution: If these two vectors are orthogonal then their dot product (inner
product) is 0, that is,
1
(αi + j + αk)(−3i + 4j + αk) = 0
2
1
or − 3α + .4 + α2 = 0
2
or α2 − 3α + 2 = 0
or (α − 1)(α − 2) = 0.
Thus given vectors are orthogonal if α = 1 or 2.

Example 183. A sled is pulled horizontally over ice by a rope attached to
its front. A 10 lb force acting at angle of 60o with the horizontal moves the
sled 50 ft. Find the work done.
Solution: Magnitude of the force F =k F k= 20, θ = 600
and k d k= 100.
1
Then work done = W =k F kk d k cos θ = 20 (100) 2 = 1000 ft lb.
Example 184. Verify Lagrange’s identity
k x × y k2 =k x k2 k y k2 −(x.y)2
for vectors x = (1, −1, 1) and y = (2, 3, −1).

Solution:

i j k

k x × y k = 1 −1 1 = i(1 − 3) − j(−1 − 2) + k(3 + 2)
2 3 −1
= −2i + 3j + 5k
Left Hand Side =k x × y k2 = 4 + 9 + 25 = 38
Right Hand Side =k x k2 k y k2 −(x.y)2 = (1 + 1 + 1)(4 + 9 + 1) − (2 − 3 − 1)2
= 3(14) − (−2)2 = 42 − 4 = 38
Left Hand Side = Right Hand Side.

Hence the identity is verified.
Example 185. (a) Find the area of the triangle determined by the points
having coordinates P1 (1, 1, 1), P2 (1, 2, 1), P3 (1, 1, 2).
(b) Find the volume of the parallelepiped for which the given vectors are three
edges x = 3i + j + k, y = i + 4j + k, z = i + j + k.
Example 186. Prove Lagrange’s identity:
k x × y k2 =k x k2 k y k2 −(x.y)2 .
Solution: Let x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ). Then
Left Hand Side =k x × y k2 = (x2 y3 − x3 y2 )2 + (x1 y3 − x3 y1 )2 + (x1 y2 − x2 y1 )2
= x22 y32 +x23 y22 −2x2 y2 x3 y3 +x21 y32 +x23 y12 −2x1 y1 x3 y3 +x21 y22 +x22 y12 −2x1 x2 y1 y2 .
Right Hand Side =k x k2 k y k2 −(x.y)2
= (x21 + x2 2 + x23 )(y12 + y22 + y32 ) − (x1 y1 + x2 y2 + x3 y3 )2

= x21 y12 +x22 y12 +x23 y12 +x21 y22 +x22 y22 +x23 y22 +x21 y32 +x22 y32 +x23 y32 −x21 y12 −x22 y22 −x23 y32
−2x1 y1 x2 y2 − 2x1 x3 y1 y3 − 2x2 x3 y2 y3 .
This implies
k x × y k2 =k x k2 k y k2 −(x.y)2 .
Vector Calculus 265
3.3 Differential Calculus of Vector Fields

Let us begin with a specific situation. Let f1 , f2 , f3 be real valued functions
defined on some interval I; then for each t ∈ 2 I we can form the vector
F(t) = f1 (t)i + f2 (t)j + f3 (t)k. (3.26)
All those points (that is, the points in the range of F ) form a curve in 3-space;
if we think of t as a time variable, we may imagine an object moving along
this curve, passing the point F(t) at time t. The functions f1 , f2 , f3 are called
the component functions, or simply the components of F.
In general, let us consider a function F defined on some set S. Whenever the
values of F are vectors, the function F is called a vector field (or a vector
valued function, or a vector function). Its domain S may be an interval
as above, or it may be a subset of the plane R2 or the space R3 . In the latter
case, the vector field has the form
F (x, y, z) = f1 (x, y, z)i + f2 (x, y, z)j + f3 (x, y, z)k. (3.27)
As an example, F may be the gravity field, which to every point P = (x, y, z) in

space associates the vector F(x, y, z) representing the gravity force at P . The
same situation occurs with the electric field and the magnetic field. Moreover,
the flow of a fluid gives rise to a velocity field which associates with each
point P = (x, y, z) the velocity vector of the fluid at this point. The form
(3.27) describes the situation when the velocity at a point P does not change
with time (this is called stationary flow); for nonstationary flow the vector
field F and its components fi depend on time, too,
F(x, y, z, t) = f1 (x, y, z, t)i + f2 (x, y, z, t)j + f3 (x, y, z, t)k, (3.28)
so in this case the domain S of F is a subset of 4-space. Many more examples

of vector fields will appear in the following sections. In the two-dimensional
situation, (3.27) simplifies to F(x, y) = f1 (x, y)i + f2 (x, y)j. In this case, the
values of F are vectors in R2 instead of R3 . As in Section 3.2 above, for
the argument vector (x, y, z) we may also write (x1 , x2 , x3 ), or x in compact
notation. The value of F at x is then simply denoted by F(x).
3.3.1 Curves
In this subsection we consider vector fields which are defined on an interval
I of the real line, as in (3.26). Such vector fields are commonly called curves,
since one imagines the points F(t) to form a curve in space as t varies in I.
F(t) = f (t)i + g(t)j, is called curves in parametric form in 2-space.
Definition 59. The vector parametric equation of a straight line in 3-space
passing through the points having coordinates (x1 , x2 , x3 ) and (y1 , y2 , y3 ) is

given by
γ = γ2 + tx, − ∞ < t < ∞ (3.29)
where x = (y1 − x1 , y2 − x2 , y3 − x3 ), γ2 = (y1 , y2 , y3 ), γ1 = (x1 , x2 , x3 ) and
γ = z = (z1 , z2 , z3 ). x = (y1 − x1 , y2 − x2 , y3 − x3 ) is called a direction vector
of the line. The scalar parametric form of the straight line (3.29) is
z1 = y1 + t(y1 − x1 ), z2 = y2 + t(y2 − x2 ), z3 = y3 + t(y3 − x3 ). (3.30)
Definition 60. (Planes in 3-space). Let P0 = (x1 , x2 , x3 ) be a point in R3

with position vector γ0 = x1 i + x2 j + x3 k. If n = a1 i + a2 j + a3 k is any given
non-zero vector then there is exactly one plane (flat surface) passing through
the point P and its equation is
n.(γ − γ0 ) = 0 (3.31)
where P = (p1 , p2 , p3 ) has position vector γ. Thus the equation of the plane
having non-zero normal vector n = a1 i + a2 j + a3 k and passing through the
point P0 = (x1 i, x2 j, x3 k) with position vector γ0 , has equation n.(γ −γ0 ) = 0.
Its scalar form is
a1 (p1 − x1 ) + a2 (p2 − x2 ) + a3 (p3 − x3 ) = 0.
Example 187. (a) The equation 2p1 − 3p2 − 4p3 = 0 represents a plane that
passes through the origin and is normal to the vector (2, −3, −4) = 2i−3j−4k.
(b) Find an equation of the plane that passes through the three points P =
(1, 1, 0), Q = (0, 2, 1) and R = (3, 2, −1).
Solution: We are required to find the normal vector n to the plane in order
to write the equation of the plane (3.31). Such a vector will be perpendicular
−−→ −→
to the vectors P Q = −i + j + k and P R = 2i + j − k. Therefore, by a property,
of the cross product ( The vector cross product is orthogonal to each vector):
−−→ −→
n = P Q × P R == −2i + j − 3k
n.(γ − γ0 ) = (−2i + 3j − 3k).(p1 − 1, p2 − 1, p3 )

where (p1 −1, p2 −1, p3 ) = (γ −γ0 ) using (1,1,0) and (p1 , p2 , p3 ) as the position
vector of γ.
3.3.2 Distances
Definition 61. (a) (Distance from point to plane)
Let P0 = (x1 , x2 , x3 ) be a point in the plane P having equation a1 p1 + a2 p2 +
a3 p3 = d. Then the distance between P0 and the plane, denoted by s, is given
by
|a1 x1 + a2 x2 + a3 x3 − d|
s= 1 .
(a21 + a22 + a23 ) 2
Vector Calculus 267
(b) (Distance between point and straight line)

Let P0 (x1 , x2 , x3 ) be a point and R be a straight line through a point P1
having position vector γ1 and parallel to a non-zero vector v. Then the distance
between P0 and the line, denoted by s, is given by
k (γ0 − γ1 ) × v k
s= .
kvk
Example 188. (a) Find the distance from (1, −1, 2) to the plane x−y−z = 9.
(b) Find the distance from (1, 0, −1) to the line γ = i + (1 + 3t)j − (3 − 4t)k
parallel to v = 3j + 4k passing through P1 = (1, 1, −3).
|1 + 1 − 2 − 9| 9
Solution: (a) s = √ =√ .
1+1+1 3
|((1 − 1)i + (0 − 1)j) + (2 + 3)k × (3j + 4k)| |(−j + 5k) × (3j + k)|
(b) s = √ = .
2
3 +4 2 5
Example 189. (a) γ(t) = 2 cos ti + 2 sin tj + tk, t ≥ 0 is a vector equation
of circular helix.
(b) The parametric equation of the vector function γ(t) = 2 cos ti+2 sin tj +5k
is x = 2 cos t, y = 2 sin t, z = 5. Its geometrical representation is given in
Figure 3.7.
(c) The vector equation of the circle in 2-space with radius α and center (0,0)
is
γ(t) = (α cos t, α sin t)

= α cos ti + α sin tj.
(d) Graph of the curve traced by the following vector functions (vector fields)
(i) F (t) = cos ti + j + sin t + k, t ≥ 0.
(ii) F (t) = 4i + 2 cos tj + 3 sin tk.
(iii) z = x2 + y 2 , y = x, x = t.
Definition 62. Let F be a vector field defined on an interval I, let t0 ∈ I be
given. We say that the vector L is the limit of F as t tends to t0 , and we write
lim F (t) = L,
t→t0
if limt→t0 k F (t) − L k= 0. We say that F is continuous at t0 if
lim F (t) = F (t0 ).

t→t0
We say that F is continuous on I if it is continuous at every point t0 ∈ I. The

limit process can be carried out by components.
Theorem 33. Let F (t) = f1 (t)i + f2 (t)j + f3 (t)k and L = l1i + l2j + l3k.
Then F (t) = L if and only if
lim f1 (t) = l1 , lim f2 (t) = l2 , lim f3 (t) = l3 . (3.32)

t→t0 t→t0 t→t0
F is continuous at t0 if and only if all fi are continuous at t0 . F is continuous

on I if and only if all fi are continuous on I.
pP
Verification: Since k F (t) − L k= (fi (t) − li )2 and therefore 0 ≤
|fi (t) − li | ≤k F (t) − L k for all i, the properties of limits and the Sandwich
theorem imply the first assertion. The other assertions are then consequences
of Definition 10.
Remark 36. If limt→t0 F (t) = L then limt→t0 k F (t) k=k L k. As for scalar
functions, the converse of this result is wrong, namely limt→t0 k F (t) k=k L k
does not imply limt→t0 F (t) = L.
Verification: By the reverse triangle inequality (see Example 9.2.3(d))
0 ≤ | k F (t) k − k L k | ≤k F (t) − L k .
It follows from the Sandwich theorem that if
lim k F (t) − L k= 0, then lim (k F (t) k − k L k) = 0.

t→t0 t→t0
For the converse choose F (t) = x and L = −x, where x is any fixed non-zero
vector. We have limt→t0 k F (t) k=k x k=k L k, but limt→t0 F (t) = x 6= −x =
L. The usual properties of the limit can be extended for limits of vector fields.
Let F (t) → L, G(t) → M and α(t) → A as t → t0 . Then
lim (F (t) + G(t) = lim F (t) + lim G(t) = L + M, (3.33)

t→t0 t→t0 t→t0
in particular, if β is a scalar and x is a vector, and moreover
lim (F (t).G(t) = (lim F (t)).(lim G(t)) = L.M, (3.34)

t→t0
lim (F (t) × G(t)) = L × M. (3.35)

t→t0
Definition 63. (Derivative of vector field) We define the derivative of F at

t0 by
F (t0 + h) − F (t0 )
F 0 (t0 ) = lim
h→0 h
if this limit exists. In that case, F is called differentiable at t0 . Alternatively,
dF
F 0 (t0 ) is also denoted as (t0 ). Geometrically, F 0 (t0 ) points along the tan-
dt
gent of the curve in the point F (t0 ), see Figure 3.7. Differentiation can be
Vector Calculus 269
FIGURE 3.7: Geometric Interpretation of Derivative of Vector Field
carried out component by component, that is, if F (t) = f1 (t)i+f2 (t)j +f3 (t)k
is differentiable at t, then
F 0 (t) = f10 i + f20 j + f30 k. (3.36)
Indeed, applying rules (3.7) and (3.9) we see that
F (t + h) − F (t)
F 0 (t) = lim
h→0 h
f1 (t + h) − f1 (t) f2 (t + h) − f2 (t) f3 (t + h) − f2 (t)
= lim [ i+ j+ k]
h→0 h h h
f1 (t + h) − f1 (t) f2 (t + h) − f2 (t) f3 (t + h) − f3 (t)
= [ lim ]i+[ lim ]j +[ lim ]k
h→0 h h→0 h h→0 h
= f10 (t)i + f20 (t)j + f30 (t)k.
Differentiation properties of vector fields. We have already seen in cal-

culus courses that we may define, for example, the sum f + g of two functions
by (f + g)(x) = f (x) + g(x). In this manner, operations performed on function
values give rise to operations on the functions themselves. The same applies
to vector fields. Let F and G be two vector fields defined on an interval I, and
let α, β be scalars. We define
(F + G)(t) = F (t) + G(t), (αF )(t) = αF (t), (3.37)
thus (αF + βG)(t) = αF (t) + βG(t), (3.38)

(F.G)(t) = F (t).G(t), (3.39)
(F × G)(t) = F (t) × G(t). (3.40)
In addition, when v is a scalar function defined on I, we set
(vF )(t) = v(t)F t. (3.41)
We also consider the composition
(F ◦ u)(t) = F (u(t))
for a scalar function u whose range is contained in I. The following rules hold
for differentiation.
(F + G)0 (t) = F 0 (t) + G0 (t) (3.42)
(αF )0 (t) = αF 0 (t) for constants α, (3.43)
(vF )0 (t) = v(t)F 0 (t) + v 0 (t)F (t), (3.44)
0 0 0
(F.G) (t) = F (t).G (t) + F (t).G(t), (3.45)
and the chain rule for vector functions
(F ◦ u)0 (t) = F 0 (u(t))u0 (t) = u0 (t)F 0 (u(t)). (3.46)
The formulas (3.42) through (3.46) are consequences of the definitions, the
componentwise formula (3.36) and the corresponding differentiation rules for
scalar functions . They can also be written in Leibniz notation, for example
d dF dG
(F + G) = + ,
dt dt dt
d dG dF
= (F. )+( .G),
dt(F.G) dt dt
d dG dF
(F × G) = (F × )+( × G).
dt dt dt
1 3
Example 190. Let F (t) = 2t2 i − 3j, G(t) = i + tj + t2 k, u(t) = t . Verify
3
(3.45)- (3.46) for these functions.
Vector Calculus 271
Solution: We first compute the derivatives F 0 (t) = 4ti, G0 (t) = j + 2tk,

u0 (t) = t2 . We verify (3.45). We have (F G)(t) = (2t2 i − 3j)(i + tj + t2 k) =
2t2 − 3t, therefore the left hand side becomes (F G)0 (t) = 4t − 3. For the right
hand side we get
F (t)G0 (t) + F 0 (t)G(t) = (2t2 i − 3j)(j + 2tk) + 4ti(i + tj + t2 k) = −3 + 4t,
so indeed both sides are equal. We now consider (3.40). We have
(F × G)(t) = (2t2 i − 3j) × (i + tj + t2 k) = −3t2 i − 2t4 j + (2t3 + 3)k,
so the left hand side equals
(F × G)0 (t) = −6ti − 8t3 j + 6t2 k.
We compute the right hand side
F (t) × G0 (t) + F (t) × G(t) = (2t2 i − 3j) × (j + 2tk) + 4ti × (i + tj + t2 k)

= (2t2 k − 4t3 j − 6ti) + (4t2 k − 4t3 j)
= −6ti − 8t3 j + 6t2 k,
therefore both sides are equal. Finally, we investigate (3.46). We have

1 1 4
(F ◦ u)(t) = 2( t3 )2 i − 3j, (F ◦ u)0 (t) = 2. .6t5 i = t5 i
3 9 3
on the left hand side, and
1 4
F 0 (u(t))u0 (t) = [4u(t)i]u0 (t) = [4( t3 )i]t2 = t5 i
3 3
on the right hand side.
√
Example 191. (a) Let F (t) = ti+ t + 1j−et k. Find F 0 (t), F 0 (0), F 00 (1)and
F (t)F 0 (t).
(b) Find F 00 (t) if F (t) = t sin ti + e−t j + tk.
√
Solution: (a) We have f1 (t) = t, f2 (t) = t + 1 and f3 (t) = −et . We get
1
F 0 (t) = f10 (t)i + f20 (t)j + f30 (t)k = i + √ j − et k,
2 t+1
1 √
F 0 (0) = i + j − k, as e0 = 1, t + 1 = 1 f or t = 0,
2
1 1
F 00 (t) = 0i − t
3 j − e k,
4
(t + 1) 2
1 1
F 00 (0) = − j − e0 k = − j − k,
4 4
√ 1 1
F (t).F 0 (t) = (ti + t + 1j − et k).(i + √ j − et k) = t + + e2t .
2 t+1 2
(b) We have f1 (t) = t sin t, f2 (t) = e−t and f3 (t) = t. We get
F 0 (t) = f10 (t)i + f20 (t)j + f30 (t)k = (t cos t + sin t)i − e−t j + k,
F 00 (t) = (−t sin t + cos t + cos t)i + e−t j + 0k = (2 cos t − t sin t)i + e−t j.
Vector Fields in Several Dimensions

In this section we introduce the concepts of gradient, divergence and curl.
These are based on the notion of partial derivatives which we have introduced
in Section. Since all kinds of derivatives of a function f at a certain point
x are defined as limits involving function values at nearby points, it is most
convenient if the domain of f always includes nearby points. This is made
precise in the following definition.
Definition 64. (Open set) A subset D of Rn is called open if for every
x ∈ D there exists a ball B with center x which is contained in D. Thus, if
x ∈ D and if the ball B in question has radius r, then every point z whose
distance from x is smaller than r must also belong to D. For example, the
interior D = (x, y, z) : 0 < x, y, z < 1 of a cube is open, while if we include its
boundary, the corresponding set R = (x, y, z) : 0 ≤ x, y, z ≤ 1 is not open.
Let us note that the definition of an open set in the plane is a special case
of Definition 1.3.6.
Gradient.
Let f be a scalar function of three variables. The gradient of f at a point
(x, y, z) is defined as the vector
∂f ∂f ∂f
∇f (x, y, z) = (x, y, z)i + (x, y, z)j + (x, y, z)k. (3.47)
∂x ∂y ∂z
If f and its partial derivatives are defined on some open subset D of R3 , the
gradient thus becomes a vector field ∇f : D → R3 whose component func-
∂f ∂f ∂f
tions are just the partial derivatives , , . According to the notation
∂x ∂y ∂z
of vectors, we may also write
∂f ∂f ∂f
∇f (x, y, z) = ( (x, y, z), (x, y, z), (x, y, z), ).
∂x ∂y ∂z
For f (x, y, z) = x2 y sin z, we have
∂f ∂f ∂f
(x, y, z) = 2xyz sin z, (x, y, z) = x2 sin z, (x, y, z) = x2 y cos z,
∂x ∂y ∂z
Vector Calculus 273
and thus ∇f (x, y, z) = 2xy sin zi + x2 sin zj + x2 y cos zk, or ∇f (x, y, z) =

(2xy sin z, x2 sin zj, x2 y cos z). For a function f of two variables (x, y),
∂f ∂f
∇f (x, y) = (x, y)i + (x, y)j.
∂x ∂y
In dimensions, we define
∇f (x) = Σ∂i f (x)ei , (3.48)

∂f
where x = (x1 , .....xn ), ∂i f = ∂xi
and ei is the unit vector. The gradient obeys
the rules
∇(f + g)(x) = ∇f (x) + ∇g(x), (3.49)
∇(αf )(x) = α∇f (x), if α is a scalar, (3.50)
∇(f g)(x) = f (x)∇g(x) + g(x)∇f (x). (3.51)
Those rules can be verified componentwise by the corresponding rules for
partial derivatives.
Linearization. In the case of a function f of a single variable we have, as a
consequence of the definition of the derivative,
f (x + h) = f (x) + f 0 (x)h + r(h) (3.52)
r(h)
for some remainder term r(h) with → 0 as h → 0. For functions of
h
several variables, the gradient plays the corresponding role.
Definition 65. Let f be a real valued function of n variables defined in an

open subset D of Rn . We say that f is differentiable at a point x ∈ D, if all
partial derivatives of f exist at x and
|f (x + h) − f (x) − ∇f (x).h|
lim = 0. (3.53)
h→0 khk
As a consequence, for a differentiable function f we obtain the analogue of

(3.52) in several dimensions as
f (x + h) = f (x) + ∇f (x)h + r(h), (3.54)
r(h)
where r(h) is a remainder term with → 0. For this reason, the function
khk
g defined by
g(z) = f (x) + ∇f (x)(z − x)
is called the linearization of f at x. Note that the difference
g(x + h) − f (x + h) = r(h)
converges to 0 faster than k h k. The notation
r(h) = o(k h k)
(read as r(h) is small o of k h k) is a common way of stating that r(h)

khk as h → 0.
It would be very inconvenient if, whenever we want to apply linearization, we
had to check the validity of (3.53) explicitly. Usually this is not necessary
because of the following theorem.
Theorem 34. Let f be a real valued function of n variables which is contin-
uously differentiable in an open subset D of Rn . Then f is differentiable at
every point x ∈ D.
This means that we only have to check whether all partial derivatives of
f are continuous in D, which is often obviously the directional derivative. We
present its definition in the general case of n dimensions.
Definition 66. (Directional derivative) Let f be a real valued function of n

variables. For each vector u ∈ Rn , the limit
f (x + tu) − f (x)
fu0 (x) = lim ,
t→0 t
if it exists, is called the directional derivative of f at x in the direction u.
Instead of fu0 (x), we also write ∂u f (x).
Remark 37. (a) When u = ei is the ith canonical unit vector, the directional
derivative fu0 becomes the partial derivative ∂i f. In particular, for a function
f of three variables (x, y, z), we obtain
∂f
(x, y, z) = fi0 (x, y, z) = ∂x f (x, y, z), i = (1, 0, 0),
∂x
and similarly
∂f ∂f
= fj0 = ∂y f and = fk0 = ∂z f.
∂y ∂z
∂f ∂f ∂f
(b) As we know already the partial derivatives , and give the rates
∂x ∂y ∂z
of change of f in the directions i, j and k respectively. Analogously, if k u k
is a unit vector, the directional derivative fu0 (x) gives the rate of change of f
in the direction u.
(c) The geometrical interpretation of the directional derivative is essentially
the same as that of the partial derivative, except that we now look at tan-
gents to the graph of f in arbitrary directions u, not only in the direction of
the coordinate axes. The following theorem gives the connection between the
gradient of f at x and its directional derivative at x.
Vector Calculus 275
Theorem 35. If f is differentiable at x, then f has a directional derivative

at x in every direction u, and
fu0 (x) = ∂u f (x) = ∇f (x).u. (3.55)
In particular, all partial derivatives ∂i f exist.

Proof. Let u ∈ Rn be given, assume u 6= 0 (otherwise, the assertion is trivially
satisfied). In order to verify the definition of the directional derivative, we
consider the identity below valid for all t 6= 0,
f (x + tu) − f (x) f (x + tu) − f (x) − t∇f (x).u

| − ∇f (x).u| =
t |t|
f (x + tu) − f (x) − t∇f (x).u

= .
k tu k . k u k
Since f is differentiable at x, the fraction on the right hand side converges to
0, see Definition 65. Therefore, the left hand side too converges to 0, which
yields the assertion.
Theorem 21 yields an important geometric property of the gradient. Ac-
cording to (3.41), the angle θ between ∇f (x) and any vector u satisfies
∇f (x)u =k f (x) kk u k cos θ.
If moreover u is a unit vector and ∇f (x) is non-zero, we get from (3.55) that
fu0 (x) =k f (x) k cos θ.
Since −1 ≤ cos θ ≤ 1, we have
− k ∇f (x) k≤ fu0 (x) ≤k ∇f (x) k .
In particular, if u points in the direction of ∇f (x), then θ = 0 and cos θ = 1,

so
fu0 (x) =k ∇f (x) k (3.56)
where, if u points in the direction of −∇f (x), then θ = Π and cos θ = −1, so
fu0 (x) = − k ∇f (x) k . (3.57)
Since the directional derivative gives the rate of change of the function in that
direction, it follows that the function f increases most rapidly in the direction
of the gradient and decreases most rapidly in the opposite direction.
Example 192. Let f (x, y, z) = xyz. Compute ∇f (1, 1, 1) and determine the
maximum and minimum rates of change of f at (1,1,1).
Solution: We have
∂f ∂f ∂f
(x, y, z) = yz, (x, y, z) = xz, (x, y, z) = xy.
∂x ∂y ∂z
Therefore
∇f (x, y, z) = yzi + xzj + xyk = (yz, xz, xy),
and hence ∇f (x, y, z) = i + j + k = (1, 1, 1). According to (3.56) and√(3.57),
the maximum and √ minimum rates of changes are ∇f (1, 1, 1) = 3 and
−∇f (1, 1, 1) = − 3 respectively.
Example 193. Calculate the directional derivative of φ(x, y, z) = 8xy 2 − xz
at an arbitrary point (x, y, z) in the direction of the vector
1 1 1 1 1 1
u = √ i + √ j + √ k = ( √ , √ , √ ).
3 3 3 3 3 3
Solution: According to Theorem 21, we have
∂u ϕ(x, y, z) = ϕ0u = ∇ϕ(x, y, z).u.
We compute
∂ϕ ∂ϕ ∂ϕ
(x, y, z) = 8y 2 − z, (x, y, z) = 16xy (x, y, z) = −x,
∂x ∂y ∂z
hence ∇ϕ(x, y, z) = (8y 2 − z, 16xy, −x) and
1 1 1 8y 2 + 16xy − x
∂u ϕ(x, y, z) = (8y 2 − z, 16xy, −x).( √ , √ , √ ) = √ .
3 3 3 3
Example 194. Let the temperature at a point (x, y) on a metallic plate in
xy
the xy-plane be given by T (x, y) = degrees Celsius. (a) Find the
1 + x2 + y 2
rate of change of temperature at (1,1) in the direction of u = 2i − j.
(b) An insect at (1,1) wants to walk in the direction in which the temperature
drops most rapidly. Find a unit vector in that direction.
Solution: (a) We have
∂T y(1 − x2 + y 2 ) ∂T x(1 + x2 − y 2 )
(x, y) = , (x, y) =
∂x (1 + x2 + y 2 )2 ∂y (1 + x2 + y 2 )2
so at (x, y) = (1, 1) we have
i j 1 1
∇T (1, 1) = + = ( , ).
9 9 9 9
Vector Calculus 277
The directional derivative gives the correct rate of change only after we nor-
u (2i − j)
malize the vector u to unit length, so we set e = = √ and compute
kuk 5
the rate of change as
i j (2i − j) 1
∂e T (1, 1) = ∇T (1, 1).e = ( + ). √ = √ .
9 9 5 9 5
(b) The temperature drops most rapidly in the direction of of −∇T (1, 1).
Using part (a), we compute a unit vector e in that direction as
√
1 1 2
−∇T (1, 1) = − i − j, k −∇T (1, 1) = k,
9 9 9
so
−∇T (1, 1) 1 1
e= = − i − − j.
k −∇T (1, 1) k 2 2
Example 195. Let the temperature at each point of a metal plate be given
by the function
T (xy) = ex cos y + ey cos x.
(a) In what direction does the temperature increase most rapidly at the point
(0,0)? What is the rate of increase?
(b) In what direction does the temperature decrease most rapidly at (0,0)?
Solution: We first compute
∂T ∂T
∇T (x, y) = (x, y)i + (x, y)j
∂x ∂y
= (ex cos y − ey sin x)i + ey (cos x − ex sin y)j.
(a) At (0,0) the temperature increases most rapidly in the direction of the
∇T (0, 0) = i + j. The rate of increase is obtained as k ∇T (0, 0) k=k
gradient √
i + j k= 2.
(b) The temperature decreases most rapidly in the direction of −∇T (0, 0) =
−i − j.
We repeat that the gradient vector ∇f (x) tells us the direction of the steepest
climb of f at the point x, and its length,∇f (x), gives the steepness.
Divergence and rotation.
Here we present the definition and some properties of the differential expres-
sions termed divergence and curl or rotation.
Definition 67. Let F (x, y, z) = f1 (x, y, z)i + f2 (x, y, z)j + f3 (x, y, z)k be a
vector field with components f1 , f2 , f3 defined on some open subset D of R3
which we assume to possess partial derivatives.
(a) The divergence of F , denoted by divF or ∇.F (read as nabla dot F), is
defined as
∂f1 ∂f2 ∂f3
divF = + + . (3.58)
∂x ∂y ∂z
It is a scalar field with domain D.

(b) The curl of F , denoted by curlF or ∇ × F (read as nabla cross F), is
defined as
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
curlF = ( − )i + ( − )j + ( − )k. (3.59)
∂y ∂z ∂z ∂x ∂x ∂y
It is a vector field with domain D. In analogy to the definition of a 3 × 3
determinant, the curl can be expressed symbolically as a traditional way to
memorize the definition of the curl. Alternatively, the notion rotation is used
instead of curl, and rotF is used instead of curlF.
In contrast to the gradient, the meanings of divergence and curl cannot be
explained from this definition in a direct and intuitive manner. The notion of
divergence is best understood in context with the divergence theorem of Gauss
(Theorem 41) and will be explained in Subsection 3.6.3. Likewise, the theorem
of Stokes (Theorem 44) helps to clarify the meaning of the curl, see Subsection
3.6.5. We can see that for a rotating body, angular velocity is proportional to
the curl of the tangential velocity. Compute ∇. F (the divergence) and ∇ × F
(the curl) of the vector field
F (x, y, z) = 2xyi + xey j + zk.
Solution: The components of F are f1 (x, y, z) = 2xy, f2 = xey , f3 = 2z.

Therefore
∂f1 ∂f2 ∂f3
(divF)(X, Y, Z) = ( + + )(x, y, z) = 2y + xey + 2.
∂x ∂y ∂z
To evaluate
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
curl F = ( − )i + ( − )j + ( − )k,
∂y ∂z ∂z ∂x ∂x ∂y
we compute
∂f3 ∂f2 ∂f1 ∂f3
( − )(x, y, z) = 0 − 0 = 0, ( − ) = 0 − 0 = 0,
∂y ∂z ∂z ∂x
∂f2 ∂f1
( − ) = ey − 2x,
∂x ∂y
and obtain
curlF (x, y, z) = 0i + 0j + (ey − 2x)k = (ey − 2x)k.
Example 196. (i) Let ϕ be a scalar field which possesses continuous first and
second partial derivatives. Show that ∇ × (∇ϕ) = 0 or curl(∇ϕ) = 0, that is,
the curl of the gradient of ϕ is identically equal to the zero vector field.
Vector Calculus 279
(ii) Let F = f1 i + f2 j + f3 k be a vector field which has continuous first and

second partial derivatives. Show that
div curlF = 0 or ∇(∇ × F ) = 0.
This means that the divergence of the curl of F is identically equal to the zero
scalar field.
Solution: (i) In order to compute the curl of ∇ϕ, we apply (3.61) to the
∂ϕ ∂ϕ ∂ϕ
vector field F = ∇ϕ, that is, f1 = , f2 = , f3 = . Inserting those
∂x ∂y ∂z
expressions into (3.63) we obtain
∂2ϕ ∂2ϕ ∂2ϕ ∂2ϕ ∂2ϕ ∂2ϕ

∇ × (∇ϕ) = ( − )i + ( − )j + ( − )k.
∂y∂z ∂z∂y ∂z∂x ∂x∂z ∂x∂y ∂y∂x
We know that we can interchange the sequence of partial derivatives if the
second partial derivatives are continuous. Therefore
∂2ϕ ∂2ϕ ∂2ϕ ∂2ϕ ∂2ϕ ∂2ϕ

= , = , =
∂y∂z ∂z∂y ∂z∂x ∂x∂z ∂x∂y ∂y∂x
so all components of the vector field ∇ × (∇ϕ) are zero, and thus identically
equal to the zero vector field.
(ii) By the definition
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
∇×F =( − )i + − )j + − )k.
∂y ∂z ∂z ∂x ∂x ∂y
Forming the divergence on both sides we get
∂ ∂f3 ∂f2 ∂ ∂f1 ∂f3 ∂ ∂f2 ∂f1
div(∇ × F ) = [( − )] + [( − )] + [( − )]
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
∂ 2 f3 ∂ 2 f2 ∂ 2 f1 ∂ 2 f3 ∂ 2 f2 ∂ 2 f1
= − + − + −
∂x∂y ∂x∂z ∂y∂z ∂y∂x ∂z∂x ∂z∂y
∂ 2 f3 ∂ 2 f3 ∂ 2 f2 ∂ 2 f2 ∂ 2 f1 ∂ 2 f1 ∂ 2 f1
=[ − ]+[ − ]+[ − − ]
∂x∂y ∂y∂x ∂z∂x ∂x∂z ∂y∂z ∂y∂z ∂z∂y
because for the same reason as in (i),
∂ 2 f3 ∂ 2 f3 ∂ 2 f2 ∂ 2 f2 ∂ 2 f1 ∂ 2 f1
= , = , = .
∂x∂y ∂y∂x ∂z∂x ∂x∂z ∂y∂z ∂z∂y
Example 197. Consider the vector field r(x, y, z) = xi+yj +zk (the identity
mapping on R3 ). We let r =k r k, that is, we also consider the scalar field r
given by r(x, y, z) =k r(x, y, z) k.
(a) Let n be any integer (positive or negative). Prove that ∇(rn ) = nrn−2 r.
(b) Let ϕ be a real valued function of one variable. Prove that curl(ϕ(r)r = 0).
1
Solution: (a) We have r(x, y, z) =k r(x, y, z) k= (x2 + y 2 + z 2 ) 2 , so
n
rn (x, y, z) = (x2 + y 2 + z 2 ) 2 . This gives
∂(rn ) n n
(x, y, z) = (x2 + y 2 + z 2 ) 2 −1 .2x = nxrn−2 (x, y, z),
∂x 2
and similarly
∂(rn ) ∂(rn )
= nyrn−2 , = nzrn−2 .
∂y ∂z
Thus, in abbreviated notation,
∂rn ∂rn ∂rn
∇rn = i+ j+ k = nxrn−2 i + nyrn−2 j + nzrn−2 k
∂x ∂y ∂z
= nrn−2 (xi + yj + zk) = nrn−2 r.

(b) Recall the formula for the curl of a vector field F ,
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
curlF = ( − )i + ( − )j + ( − )k.
∂y ∂z ∂z ∂x ∂x ∂y
In the present case (again we use abbreviated notation) F = ϕ(r)r = ϕ(r)xi+
ϕ(r)yj + ϕ(r)zk, that is,
f1 = ϕ(r)x, f2 = ϕ(r)y, f3 = ϕ(r)z,

1
and r(x, y, z) = (x2 + y 2 + z 2 ) 2 . Hence, using the chain rule,
∂f3 ∂f2 ∂r ∂r
− = zϕ0 (r) − yϕ0 (r)
∂y ∂z ∂y ∂z
−1 −1
= (ϕ02 + y 2 + z 2 ) 2 − yz(x2 + y 2 + z 2 ) 2 = 0.
Similarly we obtain
∂f1 ∂f3 ∂f2 ∂f1
− = 0, − = 0.
∂z ∂x ∂x ∂y
This proves that curl(ϕ(r)r) = 0.
Finally we note that divergence and curl satisfy the property of linearity,
that is,
div(F + G) = divF + divG, div(αF ) = αdiv + G,
curl(F + G) = curlF + curlG, curl(αF ) = αcurlF.
Definition 68. A parametrization r : D → Σ is called regular, if it is con-
tinuously differentiable on the interior of D, and if the vectors ∂u r(u, v) and
∂v r(u, v) are not parallel at all interior points (u, v) of D. A surface Σ is called
smooth if it has a regular parametrization.
Vector Calculus 281
Example 198. Consider the vertical cylinder with height H and a circular
base of radius R.
Find the plane tangent to the side surface at the point P0 = (( √12 )R, ( √12 )R, 1).
Solution: The parametrization r(u, v) = R cos ui + R sin uj + vk yields P0 =

r(u0 , v0 ) with u0 = v0 = Π4 , as well as
∂u r(u, v) = −R sin ui + R cos uj, ∂v r(u, v) = k.
The tangent plane for Σ at P0 is therefore spanned by the vectors

Π Π 1 1 Π Π
∂u r( , ) = − √ i + √ j, ∂v r( , ) = k.
4 4 2 2 4 4
Let P0 be a point on a surface Σ. Any vector which is perpendicular to the

tangent plane for Σ at P0 is called a normal vector (or simply a normal) for
Σ at P0 , or a vector normal to Σ at P0 . Note that any scalar multiple of a
normal vector is again a normal vector. A normal vector of length 1 is called
a unit normal. Unit normals are commonly denoted by n. Since the vector
product x × y of two vectors is perpendicular to both vectors x and y, we
see that the vector
N (P0 ) = ∂u r(u0 , v0 ) × ∂v r(u0 , v0 )
is a normal vector for Σ at P0 = r(u0 , v0 ). Let us return to the special situation

where Σ is given as the graph of a (continuously differentiable) function S,
r(x, y) = (x, y, S(x, y)) = xi + yj + S(x, y)k, (x, y) ∈ D. (3.60)
In this case the vectors
∂x r(x0 , y0 ) = i + ∂x S(x0 , y0 )k, ∂y r(x0 , y0 ) = j + ∂y S(x0 , y0 )k, (3.61)
span the tangent plane at P0 = (x0 , y0 , S(x0 , y0 )). Note that the parametriza-
tion e.g., (3.60) is regular, since the vectors in (3.61) are not parallel. The
vector
N (P0 ) = ∂x r(x0 , y0 ) × ∂y r(x0 , y0 ) = −∂x S(x0 , y0 ) − ∂y S(x0 , y0 )j + k (3.62)
is normal to Σ at P0 . A corresponding unit normal is given by

1
n= (∂x r(x0 , y0 )) = −∂x S(x0 , y0 )i − ∂y S(x0 , y0 )j + k), (3.63)
ν
where q
ν= 1 + (∂x S(x0 , y0 ))2 + (∂y S(x0 , y0 ))2 .
As we know from geometry, any plane in the space R3 can be described by an

equation involving a vector perpendicular to it. In our case this means that
the tangent plane to Σ at P0 is equal to the set of all points (x, y, z) which
satisfy the equation
−∂x S(x0 , y0 )(x − x0 ) − ∂y S(x0 , y0 )(y − y0 ) + (z − z0 ) = 0, (3.64)
or alternatively
z − z0 = ∂x S(x0 , y0 )(x − x0 ) + ∂y S(x0 , y0 )(y − y0 ). (3.65)
As an alternative to the parametric description (3.64), we can describe a

surface as a level set in the form
Σ = (x, y, z) : f (x, y, z) = c, (3.66)
3.4 Integration in Vector Fields

In this section we discuss integrals of vector fields over curves and surfaces.
Since they describe important aspects of processes taking place in space and
time, they constitute another basic tool for solving problems of science and
engineering, see also Section 3.6 . There are two subsections of this section
dealing respectively with line integrals and surface integrals.
3.4.1 Line Integrals

Suppose a curve C in R3 is given by the parametric equations
x = x(t), y = y(t), z = z(t), for a ≤ t ≤ b. (3.67)
The resulting vector function r : [a, b] → R3 with
r(t) = (x(t), y(t), z(t)) = x(t)i + y(t)j + z(t)k, t ∈ [a, b] (3.68)
is called a parametrization of the curve C. We will think of C not only as

a geometric locus of points (x(t, y(t), z(t)) but also as having a direction or
an orientation induced by its parametrization. The r(a) = (x(a), y(a), z(a)) is
called the initial point while r(b) = (x(b), y(b), z(b)) is called the terminal
(or end) point. A curve is called closed if r(a) = r(b), that is, if initial and
terminal points coincide. In Figures 3.8a and 3.8b we display curves as the
range of r, that is, as the set of all points r(t) = (x(t), y(t), z(t)) as t varies
over the parameter interval. A curve C is called continuous if its parametriza-
tion r is a continuous function (or equivalently, if its components in (3.67) are
continuous functions) on the parameter interval [a, b]. C is called differentiable
Vector Calculus 283
if r is differentiable, and C is called smooth if the derivative r0 is a continuous

function and the vectors r0 (t) are not zero for all t. A continuous curve C
is called piecewise smooth, if the parameter interval [a, b] admits a partition
a = t0 < · · · · · · < tn = b such that C is smooth on every interval [tj−1 , tj of
this partition.
Line Integrals of Scalar Fields
z (x(b), y(b), z(b))
(x(a), y(a), z(a))
FIGURE 3.8a: Curve Starting at t = 0
Let us first consider a straight rod of length L which we think of as a one

dimensional object with mass density ρ. If L is measured in centimeters and ρ
in grams per centimeter, its total mass will be ρL grams. Now let us consider a
curved rod as a curve C in space, whose points are given by the parametriza-
tion r(t) = (x(t), y(t), z(t)) for t ∈ [a, b], and assume that its mass density
varies along the rod as a function of (x, y, z). Then the total mass of the rod
is given by the integral
Z b
ρ(rt) k r0 (t) k dt (3.69)
a
Indeed, a limit passage like the one expanded in calculus (based on an approxi-
mation of the rod consisting of straight pieces) shows that (3.69) is the correct
formula to compute the total mass of the rod. This motivates the following
definition.
Definition 69. (Line integral of scalar field) Let C be a smooth curve
given by (3.68) and let f be a continuous scalar field whose domain contains
FIGURE 3.8b: Close Curve Directed from t = 0 to t = 2π
R
the curve C. Then the line integral of f over C is denoted by C
f ds:
Z Z b
f ds = f (r(t)) k r0 (t) k dt. (3.70)
C a
Example 199. Evaluate the line integral

Z
(x + y)ds,
C
where C is given by x(t) = y(t) = t, z(t) = t2 f or 0 ≤ t ≤ 2.

Solution: Since r(t) = (x(t), y(t), z(t)) = (t, t, t2 ), we get
r0 (t) = (x0 (t), y 0 (t), z 0 (t)) = (1, 1, 2t),

√ p √ p p
k r0 (t) k= x02 + y 02 + z 02 = 1 + 1 + (2t)2 = 2 + 4t2 .
For f (x, y, z) = x + y we get f (x(t), y(t), z(t)) = f (t, t, t2 ) = t + t = 2t, so
Z Z 2 p √
2
1 3
2 2 2 26 2
(x + y)ds = 2t 2 + 4t dt = ( (2 + 4t ) )0 = .
C 0 6 3
In the special case f = 1, the line integral (3.70) equals the length of the curve
C. Therefore, the function
Z tp Z t
s(t) = x02 + y 02 + z 02 dτ = k r0 (τ ) k (3.71)
0 o
Vector Calculus 285
yields the length of C from the initial point r(a) up to the point r(t) and is
called the arc length of the curve C; compare for the corresponding situation
in two dimensions. From (3.71) we see that s0 (t) =k r0 (t) k. This explains the
notation C f ds where ds stands for s0 (t)dt or k r0 (t) k dt. Accordingly, ds is
R
termed the line element.

Line Integrals of Vector Fields
We consider a vector field F with components f1 , f2 and f3 ,
F (x, y, z) = f1 (x, y, z)i + f2 (x, y, z)j + f3 (x, y, z)k, (3.72)
whose domain contains the curve C, that is, the points r(t) belong to the
domain of F for all t ∈ [a, b].
Definition 70. (Line integral of vector field) Let C be a smooth curve

given by (3.68) and let F be a continuous vector field of the form (3.72) whose
domain
R contains the curve C. Then the line integral of F over C is denoted
by C F.dr and defined by
Z Z b
F.dr = F (r(t)).r0 (t)dt. (3.73)
C a
When C is closed, one also writes

I
F.dr
C
R
instead of C F.dr. In this case, the value of the line integral is also called the
circulation of F around C.
We give some explanations concerning the integral on the right hand side of
(3.73). The integrand is real valued function of the single variable t, obtained
as the scalar product of the vector F (r(t)) (the value of the vector field at the
curve point r0 (t)) and the vector (a vector tangent to the curve at that point).
If we expand the scalar product into components according to its definition,
we obtain, since r0 (t) = (x0 (t), y 0 (t), z 0 (t)),
Z Z b
F.dr = f1 (r(t))x0 (t) + f2 (r(t))y 0 (t) + f3 (r(t))z 0 (t)
C a
Z b
= f1 (x(t), y(t), z(t))x0 (t) + f2 (x(t), y(t), z(t))y 0 (t)
a
+f3 (x(t), y(t), z(t))z 0 (t). (3.74)
R
One may ask why the line integral C F.dr is defined in this way. In fact, it
expresses certain quantities which are important in applications, for example
the mechanical work done by a force field F , moving a point mass along the
curve C from the initial to the terminal point. This will elaborated in more
detail in Subsection 3.6.2.
Example 200. Let the curve C be given by r(t) R = (cos t)i − (sin t)j + tk
on the interval 0 ≤ t ≤ π. Evaluate the integral C F.dr where F (x, y, z) =
yi − xj + 2k.
Solution: We use formula (3.73) or (3.74). In order to form F (r(t)), we have to
insert cos t for x, -sin t for y and t for z. This gives F (r(t)) = (− sin t, − cos t, 2).
Moreover, r0 (t) = (− sin t, − cos t, 1). Computing the scalar product of those
two vectors and integrating, we get
Z Z π Z π
F.dr = F (r(t)).r0 (t)dt = (sin2 t + cos2 t + 2)dt = π + 2π = 3π.
C 0 0
R
The line integral F.dr is also frequently denoted as
Z
f1 (x, y, z)dx + f2 (x, y, z)dy + f3 (x, y, z)dz, (3.75)
C
or in shorter form as Z
f1 dx + f2 dy + f3 dz.
C
The definition of those expressions is, of course, the same as above in (3.73)
or (3.74). If we formally replace x by x(t) and dx by x0 (t)dt (and analogously
for y and z) in (3.75), we arrive at (3.74).
Example 201. Evaluate xydx − y cos xdy, where C is given by x(t) = t2
R
and y(t) = t on the interval −2 ≤ t ≤ 5.

Solution: We have x0 (t) = 2t and y 0 (t) = 1, therefore
Z Z
xydx − y cos xdy = 25 [xyx0 − y(cos x)y 0 ](t)dt
C −
t5
Z
1
= 25 t2 t2t − t cos t2 dt = [2 ]− 25 − [sin t2 ]− 25
− 5 2
32 1
2(625 + ) − (sin 25 − sin 4)
5 2
6314 1
− (sin 25 − sin 4).
5 2
Example 202. Find the circulation of the field F (x, y) = (−y + x)i + xj
around the circle x2 + y 2 = 1.
Solution: The parametric equation of the given circle is
r(t) = cos ti + sin tj, 0 ≤ t ≤ 2Π.
We have
F (r(t)) = (− sin t + cos t)i + cos tj, r0 (t) = − sin ti + cos tj,
Vector Calculus 287
F (r(t))r02 (t) − sin t cos t + cos2 (t) + cos2 (t) = 1 − sin t cos t.
Therefore, the circulation of F around the circle (denoted as C) becomes
Z 2 Z 2
sin2 t 2
I
0
F.r = ΠF (r(t)).r (t)dt = Π(1 − sin t cos t)dt = [t − ] Π.
C 0 0 2 0
The result of the following example will be used later in the proof of the
theorem of Green and Ostrogradski.
Example 203. Let C be the graph of a function y = h(x), where h : [a, b] →
R is differentiable and let F be a vector field whose second component is zero,
that is, F (x, y) = (f (x, y), 0) for some continuous real valued function f . Show
that Z Z b
F.dr = f (x, h(x))dx. (3.76)
C a
Solution: We parameterize the curve C as r(t) = (t, h(t)). Then r0 (t) =

(1, h0 (t)) and
Z Z b Z b
F.dr = f (r(t)).1 + 0.h0 (t)dt = f (t, h(t))dt,
C a a
so (3.76) is proved. (Recall that it does not determine whether the integration
variable is denoted by t or by x.)
The following properties hold for the line integral. Let F and G be continuous
vector fields defined on some region in space containing a curve C, which is
given as the range of some vector function r as above. Then
Z Z Z
(F + G).dr = F.dr + G.dr. (3.77)
C C
Z Z
αF.dr = α F.dr, α being any scalar. (3.78)
C C
This means that the line integral is linear with respect to the vector fields,
that is, the line integral of the sum of two vector fields (the scalar multiple of a
vector field) is equal to the sum of the line integrals (the scalar multiple of the
line integral). These formulas are a direct consequence of the definition of the
line integral, using the corresponding properties of the ordinary integral. For
piecewise smooth curves, the line integral is evaluated on each piece separately,
as stated in the following definition.
Definition 71. (Line integral, piecewise smooth curves) Let C be a piecewise
smooth curve given by (3.68) and let F be a continuous vector field of the
form (3.72) whose domain contains the curve C. Then the line integral of F
over C is defined by
Z Xn Z
F.dr = i − 1ti F (r(t)).r0 (t)dt, (3.79)
C 1 t
where a = t0 < .....tn = b is a partition of [a, b] such that C is smooth on

every interval [ti−1 , ti ].
Path Independence, Potential Functions and Conservative Fields

Definition 72. Let F be a vector field defined on an open region D in R3 . If
F = ∇ψ for some scalar function ψ defined on D, then ψ is called a potential
function (or simply potential) for F in D, and the vector field F is called
conservative.
We will explain in Remark 46 why the term “conservative” is used here.
Theorem 36. Let F be a continuous vector field defined on an open region
D in R3 which possesses a potential function in D. Let C be a smooth curve
in D with initial point A and terminal point B. Then
Z
F.dr = ψ(B) − ψ(A). (3.80)
The proof will be given in Appendix C. Let us take a look at formula (3.80).
Given the potential ψ , the value of the right hand side depends only on the
points A and B. This means in particular that the value of the line integral on
the left hand side does not change if we replace C by a different curve C̃, as
long as C̃ has the same initial and terminal points as C. This property is called
path independence. Theorem 36 thus states that for conservative vector fields,
line integrals are path independent. We present another related definition.
Definition 73. A continuous vector field F defined on an open region D in
R3 is called circulation free on D if we have
I
F.dr = 0 (3.81)
C
for every closed curve C in D. When the curve C is closed, its initial and termi-
nal points coincide, and the right hand side of (3.80) becomes zero. Therefore,
a conservative field is circulation free by Theorem 36. Actually, the converse
is true, too, so the following theorem holds.
Theorem 37. (Closed loop property of conservative fields) Let F be a con-
tinuous vector field defined on an open region D in R3 . Assume that D is
connected, that is, any two points in C can be connected by a smooth curve.
The following statements are equivalent:
1. F is conservative on D.
2. F is circulation free in D.
The proof will be given in Appendix C. Another way of stating Theorem
37 would be to say that a vector field F possesses a potential in D if and only
if the line integral over any closed curve in D is zero.
Vector Calculus 289
Example 204. Let ψ(x, y, z) = xyz, let C be a smooth R curve with initial
point (−1, 6, 18) and terminal point (2, 12, −8). Find C F.dr, where F = ∇ψ.
Solution: Setting A = (−1, 6, 18) and B = (2, 12, −8) in Theorem 43, we get
Z
F.dr = ψ(B) − ψ(A) = 2.12.(−8) − (−1).6.18 = −192 + 108 = 84.
C
3.4.2 Surface Integrals

From Subsection 3.3.3 we know how a surface Σ in space is described by a
parametrization r : D → Σ defined on some domain D of the plane. Our first
task is to compute the area of the surface. As an introduction to the general
formula, we consider the parallelogram with corners 0, x, y and x + y, where
x and y are two nonparallel vectors in space. We know that its area equals
k x × y k. We interpret this expression in terms of the parametrization
r(u, v) = ux + vy, Σ = r(D), D = [0, 1] × [0, 1].
Indeed, since
∂u r(u, v) = x, ∂v r(u, v) = y,
are constant as functions of u and v, we have
Z 1Z 1 Z Z
k x × y k= k x × y k dudv = k ∂u r(u, v) × ∂v r(u, v) k dA,
0 0 D
so we have expressed the area in terms of a two-dimensional integral. If we

just want to determine the area of a parallelogram, there is no need for such
a complicated formula, but for a general surface, this is the way to go.
Definition 74. (Area of surface) Let Σ be a smooth surface given as
Σ = r(D) with a regular parametrization r. The area of Σ is defined as
Z Z
A(Σ) = k ∂u r(u, v) × ∂v r(u, v) k dA. (3.82)
D
Since r is regular, the integral on the right hand side is well-defined as the
limit of the corresponding Riemannian sums. Those sums can be interpreted as
the area of small parallelograms which approximate corresponding portions of
Σ. (We do not carry out the details, which in fact would be rather cumbersome
to do). Thus, (3.82) is a natural definition of the area of an arbitrary curved
surface. The most important special case arises when Σ is the graph of a
function S.
Theorem 38. Let Σ be a smooth surface given as z = S(x, y), where S is a
continuously differentiable function defined on some domain D of the plane.
Then we have
Z Z q
A(Σ) = 1 + ∂x S(x, y)2 ∂y S(x, y)2 dA. (3.83)
D
Verification: From Subsection 3.3.3 we know that with r(x, y) = xi + yj +

S(x, y)k we obtain
∂x r(x, y) × r(x, y) = −∂x S(x, y)i − ∂y S(x, y)j + k.
Taking the length of this vector, we see that (3.83) is indeed a special case
of (3.82). Formulas (3.82) and (3.83) are analogous to the formula for the
length of a curve as the integral over the length of tangent vectors. Similarly,
the definition of the surface integral of a scalar function f defined on Σ is
analogous to that of a line integral as given in Definition 34.
Definition 75. (Surface integral) Let Σ be a smooth surface, given as
Σ = r(D) with a regular parametrization r. Let f be a scalar continuous
function defined
R R on Σ. Then the surface R R integral of f over the surface Σ is
denoted by Σ
f (x, y, z)dσ or simply Σ
f dσ:
Z Z Z Z Z Z
f dσ = f (x, y, z)dσ = f (r(u, v)) k ∂u r(u, v) × ∂v r(u, v) k dA.
Σ Σ D
(3.84)
For the case where Σ is the graph of a function S, we obtain the following
formula for the surface integral from (3.84) in the same manner as we obtained
the area in Theorem 38 from Definition 74.
Theorem 39. Let Σ be a smooth surface given as z = S(x, y), where S is a
continuously differentiable function defined on some domain D of the (x, y)-
plane. Let f be a scalar continuous function defined on Σ. Then we have
Z Z Z Z s
∂S ∂S
f dσ = f (x, y, S(x, y)) 1 + ( )2 (x, y) + ( )2 (x, y)dA.
Σ D ∂x ∂y
(3.85)
RemarkR 38.R (i) In the special case f = 1 we obtain the area of the surface,
A(Σ) = Σ
1dσ, as we can see from the definitions.
(ii) For a given surface Σ, it is possible to choose different parametrization r
of it. One can prove that this does not change the value of the area resp. of
the surface integral.
(iii) Let a surface Σ consist of smooth components Σ1 , Σ2 , ...Σn which are
mutually disjoint or intersect each other only in a set of zero area, for exam-
ple along a curve. Such a surface is called piecewise smooth. The surface
integral over Σ is then defined as
Z Z Z Z Z Z
f dσ = f dσ + ....... + f dσ. (3.86)
Σ Σ1 Σn
For example, the boundary Σ of a cube is not smooth (the normal vector
jumps across its edges), but it is piecewise smooth, since it consists of six
square pieces which are smooth. p
(iv) The formal expression dσ = 1 + ∂x S(x, y)2 + ∂y S(x, y)2 dA. is often
called the surface element.
Vector Calculus 291
f dσ, where f (x, y, z) = y 2

RR
Example 205. Evaluate the surface integral
and Σ is the part of the plane z = x given by 0 ≤ x ≤ 2, 0 ≤ y ≤ 4.
Solution: We have S(x, y) = x and S : D → Σ with D = [0, 2] × [0, 4], so
r
∂S ∂S p √
1 + ( )2 (x, y) + ( )2 (x, y) = 1 + 12 + 02 = 2,
∂x ∂x
and therefore
Z Z Z Z
2
√ Z 2 Z 4 √
f dσ = y 2dA = y 2 2dydx
Σ D 0 0
√ Z 2
128 √
Z 4
= 2 dx. 2. y 2 dy =
0 0 3
f dσ where f (x, y, z) = z 2 ,
RR
Example 206. Evaluate the surface integral Σ p
and Σ is the portion of the boundary surface of the vertical cone z = x2 + y 2
which lies between the planes z = 1 and z = 2.
Solution: We present two different approaches. For the first one, we use a
parametrization adapted to the cone, namely
r(u, v) = u cos vi + u sin vj + uk,
so Σ = r(D) with D = (u, v) : 1 ≤ u ≤ 2, 0 ≤ v ≤ 2Π. In order to use Defini-

tion 75 we compute
∂u r(u, v) = cos vi + sin vj + k, ∂v r(u, v) = −u sin vi + u cos vj.
We get
(∂u r × ∂v )r(u, v) = (0 − u cos v)i − (−u sin v + 0)j + (u cos2 v + u sin2 v)k
= −u cos vi + u sin vj + uk
and moreover
p √ √
k ∂u r × ∂v )r(u, v) k= u2 cos2 v + u2 sin2 v + u2 = 2u2 = u 2.
Inserting this into (3.84), we get, since f (r(u, v)) = u2 ,

Z Z Z Z
2
√ √ Z 2 Z 2 3 15 √
f dσ = u .u 2dA = 2 Π u dudv = Π 2.
Σ D 0 1 2
p
For the second approach, we use Theorem p 39 with S(x, y) = x2 + y 2 and
the annular domain D = (x, y) : 1 ≤ x2 + y 2 ≤ 2. We have f (x, y, S(x, y)) =
S(x, y)2 = x2 + y 2 , so
Z Z Z Z s
∂S ∂S 2
f dσ = (x2 + y 2 ) 1 + ( )2 (x, y) + ) (x, y)dA.
∂x ∂y
The partial derivatives of S are

∂S x ∂S y
(x, y) = p , (x, y) = p ,
∂x x2 + y 2 ∂y x2 + y 2
so r s
∂S x2 √
1 + ( )2 (x, y) = 1+ y2
= 2.
∂x x2 + y 2 + x2 +y 2
This yields Z Z Z Z √
f dσ = (x2 + y 2 ) 2dA.
D D
Since D is an annular region, it is best to evaluate this two-dimensional integral
in polar coordinates. Using well known results of polar coordinates, we get with
G = (r, θ) : 1 ≤ r ≤ 2, 0 ≤ θ ≤ 2Π that
Z Z Z Z √ √ Z Z
f dσ = 2 2
(x + y ) 2dA = 2 (r2 cosθ +r2 sin2 θ)rdA
Σ D G
√ RR 3
= 2 r drdθ which is the same as above.
f dσ, where f (x, y, z) = x2
RR
Example 207. Evaluate the surface integral
and Σ is the upper half of the sphere x + y + z = a2 .
2 2 2
p
Solution: First we apply Theorem 39. We have S(x, y) = a2 − x2 − y 2
and
∂S x ∂S y
(x, y) = − p , (x, y) = − p .
∂x 2 2
a −x −y 2 ∂y a − x2 − y 2
2
The integration domain becomes the circular disk D = (x, y) : 0 ≤ x2 + y 2 ≤ a2 ,

and therefore Z Z Z Z
a
f dσ = x2 p dA.
D a − x2 − y 2
2
Again, this two-dimensional integral is best evaluated in polar coordinates,

and Theorem 39 yields
Z Z Z 2 Z a
a
f dσ = Π r2 cos2 θ. √ rdrdθ
Σ 0 0 a − r2
2
Z 2 Z a
r3
a. Π cos2 θdθ. √ dr.
0 0 a2 − r2
The rightmost integral can be evaluated with the substitution r = a sin t,
dr = a cos tdt, and (we omit the details)
Π
a
r3
Z Z 2 2 3
√ dr = a3 sin3 tdt = a .
0 a2 − r 2 0 3
Vector Calculus 293
We finally obtain
Z Z Z 2
2 2
f dσ = a Π cos2 θdθ. a3 = Πa4 .
Σ 0 3 3
An alternative solution arises from the use of spherical coordinates,
r(u, v) = a sin u cos vi + a sin u sin vj + a cos uk.
Here Σ = r(D) with D = (u, v) : 0 ≤ u ≤ Π2 , 0 ≤ v ≤ 2Π. A straightforward

but somewhat lengthy computation yields
k (∂u r × ∂v r)(u, v) k= a2 sin u.
Since f (x, y, z) = x2 , we have f (r(u, v)) = a2 sin2 u cos2 v, and from (3.84) we
get Z Z Z Z Z
f dσ = a2 sin2 u cos2 v.a2 sin udA
D
Π
Z Z 2
4
2
3 2 4
=a sin udu. Π cos2 vdv = Πa
0 0 3
as before.
3.5 Fundamental Theorems of Vector Calculus

We discuss here three very important results of vector calculus. The first
result bears the names Green and Ostrogradski, in honor of the British sci-
entist George Green and the Ukrainian mathematician Mikhail Ostrogradski.
This theorem establishes a relationship between a double integral over a do-
main in a plane and a line integral over the boundary of that domain. The
second result which we present here is known as the divergence theorem of
Gauss in honor of the German mathematician and scientist Carl Friedrich
Gauss. This theorem relates a volume (triple) integral over a region in space
to a surface integral over the boundary surface of that region.
The third theorem is the Stokes theorem, named for Sir George Gabriel
Stokes, an Irish mathematician who worked at Cambridge University. This
theorem connects a surface integral to a line integral over the boundary curve
of the surface. Thus, all those theorems relate integrals over objects of different
(adjacent) dimensions. In some sense they can be viewed as extensions of the
fundamental theorem of calculus; recall that the latter relates an integral over
an interval (a one-dimensional object) to function values on the boundary (a
zero-dimensional object). There are numerous applications of this theorem in
science and engineering and some of them will be described in the next section.
3.5.1 Theorem of Green and Ostrogradski

Let C be a piecewise smooth curve in the plane R2 parameterized as
r(t) = (x(t), y(t)) for a ≤ t ≤ b. Let C be closed, that is, r(b) = r(a) holds
for the initial and terminal points. The curve C is called positively oriented
if r(t) traverses C counterclockwise (anticlockwise) as t varies from a to b.
If r(t) traverses C clockwise, then C is said to be negatively oriented. The
parametrization r(t) = (cos t, sin t) traverses the unit circle C anticlockwise
as t varies from 0 to 2Π, and hence C is positively oriented by r.
On the other hand, the parametrization r(t) = (− cos t, sin t) defines a
negatively oriented curve C (whose graph is again the unit circle), since it
traverses C clockwise. A non-closed curve C in the plane is called simple if
it does not intersect itself, that is, if it has a parametrization r : [a, b] → R2
which is one-to-one, so r(t1 ) = r(t2 ) can hold only if t1 = t2 . If we imagine
the graph of a curve as a train track, the track does not cross itself. A closed
curve is called simple if it does not intersect itself except for the initial and
final point, thatH is, r(t1 ) = r(t2 ) holds for t1 ≤ t2 only if t1 = a and t2 = b.
Recall that by F.dr we denote the line integral of a vector field F = (f, g)
over a closed curve C, and also write it as
I I
F.dr = f (x, y)dx + g(x, y)dy.
Theorem 40. (Green-Ostrogradski) Let D be a bounded domain in the

plane whose boundary C is a closed, simple and positively oriented curve. Let
F = (f, g) be a vector field whose components are continuously differentiable.
Then I Z Z
∂g ∂f
f (x, y)dx + g(x, y)dy = ( − )dA. (3.87)
D ∂x ∂y
The proof of this theorem will be given in Appendix C.
Remark 39. (i) The theorem relates a line integral, which is one-dimensional,
to an integral over a two-dimensional region.
(ii) The theorem is used for theoretical purposes as well as for the computation
of specific integrals. In particular it may happen that one of the integrals is
much easier to calculate directly than the other one.
Example 208. (a) A particle moves counterclockwise once around the tri-
angle D with vertices (0,0), (4,0) and (1,6), under the influence of the force
F (x, y) = xyi+xj. Calculate the work done by this force, if the units of length
and force are Hmeters and Newtons, respectively.
(b) Evaluate F.dr, where F (x, y) = (x2 − y)i + (cos2 y − e3y + 4x)j and C
is (the boundary of) any square with sides of length 5. Assume C is oriented
counterclockwise.
Solution: (a) Let us denote by C the curve formed by the three sides of the
Vector Calculus 295
H
triangle. The total work done by the force F equals the line integral F.dr.
∂f ∂g
We have F = (f, g) with f (x, y) = xy, g(x, y) = x, so = x and = 1.
∂y ∂x
From (40) we know that
I Z Z Z Z
∂g ∂f
F.dr = ( − (x, y))dA = (1 − x)dA.
D ∂x ∂y D
We compute the two-dimensional integral over the triangular region D as a

double integral,
Z Z Z 1Z 6 Z 4Z 8
(1 − x)dA = x(1 − x)dydx + −2x(1 − x)dydx
D 0 0 1 0
Z 1 Z 4
= 6x(1 − x)dx + (8 − 2x)(1 − x)dx = −8.
0 1
Therefore, the work done equals −8 Newton meter.
(b) We have F = (f, g) with f (x, y) = x2 − y, g(x, y) = cos2 y − e3y + 4x,
∂f ∂g
so = −1 and = 4. The two-dimensional region D is a square of side
∂y ∂x
length 5. We calculate
Z Z Z Z Z Z Z
∂g ∂f
F.dr = ( − (x, y))dA = 4 − (−1)dA = 5 dA.
D ∂x ∂y D D
RR
Since
H dA. equals the area of the square, which is equal to 25, we obtain
F dr = 125.
Example H 209. (a) Use the Green-Ostrogradski theorem to evaluate the line
integral y 2 dx + x2 dy, where C is the boundary of the square with vertices
(0,0), (1,0), (0,1) and (1,1), oriented counterclockwise. Check the answer by
evaluating the line integral directly.
(b) Do the same for xydx + (y + x)dy, where C is the unit circle x2 + y 2 = 1,
H
oriented counterclockwise.
(c) Verify the validity of the Green-Ostrogradski theorem for the vector field
F = 2yi − xj and the curve C taken as the circle of radius 4 with center (1,3).
Solution: (a) Here f (x, y) = y 2 , g(x, y) = x2 , so ∂f ∂g
∂y = 2y and ∂x = 2x,
moreover D = [0, 1] × [0, 1]. Using (3.87) we get
I Z Z Z Z
∂g ∂f
f (x, y)dx + g(x, y)dy = ( − )(x, y)dA = 2x − 2ydydx
C D ∂x ∂y D
Z 1Z 1
= 2x − 2ydydx = 1 − 1 = 0. (3.88)
0 0
y(t)2 x0 (t) + x(t)2y 0 (t)dt separately

R
For the line integral, we have to evaluate
along all four sides of the square, traversed by a suitable parametrization

r(t) = (x(t), y(t)). Along the side x = 0 we have y(t) = 0 = x0 (t), and along
the side y = 0 we have x(t) = 0 = y 0 (t), so the corresponding line integrals
are zero. The side x = 1 is parameterized by r(t) = (x(t), y(t)) = (1, t) for
0 ≤ t ≤ 1, so the line integral becomes
Z 1
t2 .0 + 12 .1dt = 1.
0
Analogously, the line integral along y = 1 yields the value -1, so the overall
integral gives 0 + 0 + 1 − 1 = 0, which is the same as the result in (3.88).
∂g ∂g
(b) We have f (x, y) = xy, g(x, y) = y + x, so = x and = 1, moreover
∂x ∂x
2 2
D = (x, y) : 0 ≤ x + y ≤ 1 is the unit disk. Thus with F = (f, g) we obtain
I Z Z Z Z
∂g ∂f
F.dr = ( − )(x, y)dA = 1 − xdA.
C D ∂x ∂y D
Using polar coordinates we get, according to Theorem 40,

Z Z Z 2 Z 1 Z 2
1 1
1 − xdA = Π (1 − r cos θ)rdrdθ = Π( − )dθ = Π.
D 0 0 0 2 3 cos θ
H
In order to compute C F dr directly, we parameterize C by r(t) = (x(t), y(t))
with x(t) = cos t, y(t) = sin t, where 0 ≤ t ≤ 2Π. We obtain
I Z 2
xydx + (y + x)dy = Π[xyx0 + (y + x)y 0 ](t)dt
C 0
Z 2
= Π[cos t sin t(− sin t) + (sin t + cos t) cost]dt
0
Z 2 Z 2 Z 2
=− Π cos t sin2 tdt + Π sin t cos tdt + Π cos2 tdt
0 0 0
Z 2
=0+0+ Π cos2 (t)dt = Π.
0
R2 2
Recall that 0 Π cos tdt = Π.
(c) We have F = (f, g) with f (x, y) = 2y, g(x, y) = −x, so ∂f ∂g
∂y = 2, ∂x = −1,
and D is the disk of radius 4 with center (1, 3). We get (A(D) denotes the
area of D).
I Z Z Z Z
∂g ∂f
F dr = ( − )(x, y)dA = −3dA
C D ∂x ∂y D
= −3.A(D) = −3.42 Π = −48Π.

Vector Calculus 297
H
To evaluate C F dr directly, we parameterize C by r(t) = (x(t), y(t)) with
x(t) = 1 + 4 cos t, y(t) = 3 + 4 sin t, where 0 ≤ t ≤ 2Π. We obtain
I Z 2
2ydx − xdy = Π[2yx0 − xy 0 ](t)dt
C 0
Z 2
= Π2(3 + 4 sin t)(−4 sin t) − (1 + 4 cos t)4 cos tdt
0
= −24 sin t − 32 sin2 t − 4 cos t − 16 cos2 tdt

= 0 − 32Π − 0 − 16Π = −48Π.
R2 R2
Recall that 0
Πsin tdt = 0 Πcos2 tdt = Π.
2
3.5.2 Divergence Theorem of Gauss

We consider the following situation in the space R3 . Let D be an open
set which is bounded by a surface Σ such that D lies completely inside Σ.
For example, the open unit ball x : x ∈ R3 , k x k< 1 is bounded by and lies
completely inside the unit sphere x : x ∈ R3 , k x k= 1; the same is true for
the (full) cube whose boundary surface consists of its six sides. In the case of
the ball the boundary surface (the sphere) is smooth and in the case of the
cube the boundary surface is piecewise smooth. At every point x of a smooth
surface, normal vectors can be defined (see Subsection 3.3.3).
In the present situation, we can distinguish between outer normals point-
ing away from D and inner normals pointing into D (compare Figure 3.9).
Thus, while we have two unit normals (normals whose length equals 1) at each
point x ∈ Σ, there is exactly one outer unit normal which we denote by n(x).
In this way we obtain a vector field n which is defined on the boundary surface
Σ. As a consequence of our definition of a smooth surface (Definition 32), the
field n is continuous. When the boundary surface consists of smooth pieces
(as in the case of the cube), the outer unit normal field n is continuous within
each piece, but has jumps across their connecting boundaries (the edges and
corners, in the case of the cube) where it is not defined.
Theorem 41. Let D be a domain in R3 with boundary Σ and outer unit
normal field n as described above. Suppose that F is a vector field with val-
ues in R3 whose components are continuous and have continuous first partial
derivatives in D and up to the boundary Σ. Then
Z Z Z Z Z
F ndσ = divFdV. (3.89)
Σ D
Remark 40. (i) A proof will be given in Appendix C.

(ii) The theorem relates an integral over a two-dimensional surface to an
integral over a three-dimensional volume. The integral on the left side is a
N
(Unit outer normal)
z
FIGURE 3.9: Unit Outwards Normal
surface integral (see Definition 74), the integrand being the scalar function
Fn = f1 n1 + f2 n2 + f3 n3 , where fj and nj are the component functions of
the vector fields F and n. The integral on the right side is a volume integral,
∂f1 ∂f2 ∂f3
which we treat in Calculus, of the scalar function divF = + + .
∂x ∂y ∂z
(iii) Its interpretation and some applications will be discussed in Section 3.6.
Indeed, the divergence theorem is a fundamental tool in the analysis of partial
differential equations and of the phenomena described by them. Some identi-
ties useful for that purpose will be presented below, following the examples.
(iv) Equation (3.89) is sometimes helpful when one wants to compute surface
or volume integrals, because the evaluation of one side may be more conve-
nient than the evaluation of the other side.
Example 210. For each of the following data calculate the right hand side
and left side of Equation (3.89), whichever is convenient. (i) F (x, y, z) = xi +
yj − zk, Σ is the sphere of radius 4 centered at (1,1,1).
(ii)F (x, y, z) = x3 i + y 3 j + z 3 k, Σ is the sphere of radius 1 with center at the
origin.
(iii)F (x, y, z) = x2 i + y 2 j + z 2 k, Σ is the rectangular box bounded by the
coordinate planes x = 0, y = 0, z = 0 and the planes x = 6, y = 2 and z = 7.
Solution: In all cases, D denotes the region enclosed by Σ. Recall that
∂f1 ∂f2 ∂f3
divF = + +
∂x ∂y ∂z
Vector Calculus 299
where F = f1 i + f2 j + f3 k .
(i) We have f1 (x, y, z) = x, f2 (x, y, z) = y, f3 (x, y, z) = −z, so
divF(x, y, z) = 1 + 1 − 1 = 1.
We evaluate the right hand side of (3.89) as (here, vol(D) denotes the volume
of D, a ball of radius 4).
Z Z Z Z Z Z
4 256
divFdV = dV = vol(D) = Π.43 = Π.
D D 3 3
(ii) Here f1 (x, y, z) = x3 , f2 (x, y, z) = y 3 , f3 (x, y, z) = z 3 . So
∂f1 ∂f2 ∂f3

(x, y, z) = 3x2 , (x, y, z) = 3y 2 , (x, y, z) = 3z 2 ,
∂x ∂y ∂z
divF (x, y, z) = 3x2 + 3y 2 + 3z 2 .

The right hand side of (3.89) becomes
Z Z Z Z Z Z
divF dV = 3 (x2 + y 2 + z 2 )dV
D D
Using spherical coordinates, we have already evaluated this integral in Exam-

ple 8.7.4, so Z Z Z
4 12
divF dV = 3. Π = Π.
D 5 5
(iii) F (x, y, z) = x2 i + y 2 j + z 2 k, so
divF (x, y, z) = 2x + 2y + 2z.
Again we evaluate the right and side of (3.89), this time with Frobenius the-
orem applied to the rectangular box,
Z Z Z Z 7Z 2Z 6
divFdV = 2 x + y + zdxdydz
D 0 0 0
Z 7Z 2 Z 7 Z 2
1
=2 [ x2 + xy + xz]x=6
x=0 dydz = 36 + 12y + 12zdydz
0 0 2 0 0
Z 7 Z 7
= [36y + 6y 2 + 12z]y=2
y=0 dz = 72 + 24 + 24zdz
0 0
= [96z + 12z 2 ]z=7

z=0 = 672 + 12.49 = 1260.
Example 211. Verify the statement of Gauss divergence theorem for the
vector field F (x, y, z) = xi + yj + zk and the cone whose interior is given by
D = (x, y, z) : 0 < z < 1, 0 < x2 + y 2 < z 2 .
z (0, 0, 1)
FIGURE 3.10: Boundary Surface
Solution: The boundary surface Σ of D consists of two parts, p see Figure

3.10. Its lateral part Σ1 is described by z = S(x, y) with S(x, x2 + y 2 for
x, y ∈ [−1, 1] and its upper part Σ2 is the flat circular disk described by
x2 + y 2 ≤ 1 lying in the plane z = 1. The surface integral on the left hand
side of (3.89) thus decomposes into two parts,
Z Z Z Z Z Z
F.ndσ = F.n1 dσ + F.n2 dσ
Σ Σ1 Σ2
where n1 and n2 are the vector fields of outer unit normals to Σ1 and Σ2 .
According to (3.70),
∂S ∂S x y
N1 (x, y, z) = − (x, y)i − (x, y)j − k = − i − j + k
∂x ∂y z z
p
is a normal vector to Σ1 in (xy, z)), where z = x2 + y 2 . The corresponding
outer unit normal n1 is then given by
1 x y
n1 (x, y, z) = √ ( i + )j − k.
2 z z
Moreover, n2 = k as Σ2 is a flat horizontal surface. For the given vector field

Vector Calculus 301
F (x, y, z) = xi + yj + zk we now compute

Z Z Z Z Z Z
F.ndσ = F.n1 dσ + F.n2 dσ
Σ Σ1 Σ2
Z Z Z Z
1 x y
= (xi + yj + zk). √ ( i + j − k)dσ + (xi + yj + zk).kdσ
Σ1 2 z z Σ1
x2 y2
Z Z Z Z Z Z
1
=√ ( + − z) + zdσ = 0 + 1dσ
2 Σ1 z z Σ2 Σ3
= Π,
2 2 2
as z = x + y on Σ1 , z = 1 on Σ2 and the area of the disk Σ2 equals Π.
The right hand side of (3.89) becomes
Z Z Z Z Z Z Z Z Z
divF dV = 1 + 1 + 1dV = 3 1dV = 3vol(D).
D D D
Since the volume of the cone of height 1 and radius 1 is equal to Π3 , the right
hand side, too, is equal to Π.
We now present Green’s identities. Here ∆ denotes the Laplace operator,
∂2f ∂2f ∂2f

∆f = + + .
∂x2 ∂y 2 ∂z 2
Theorem 42. (Green’s first identity) D, Σ and n as in Theorem 41. Let
f and g be continuous scalar fields whose first and second partial derivatives
are continuous in D and up to the boundary Σ. Then
Z Z Z Z Z
f ∇g.ndσ = (f ∆g + ∇f.∇g)dV. (3.90)
Σ D
One can derive Green’s first identity from the divergence theorem by choosing
F = f ∇g in (3.89).
Theorem 43. (Green’s second identity) Let D, Σ, n, f and g as in The-

orem 42. Then
Z Z Z Z Z
(f ∇g − g∇f )ndσ = (f ∆g − g∆f )dV. (3.91)
Σ D
Green’s second identity is obtained from Green’s first identity by interchanging

the roles of f and g, and then subtracting the resulting formula from the
original one. As a particular case of Green’s identities, setting f = 1 we get
Z Z Z Z Z
∇g.ndσ = ∆gdV. (3.92)
Σ D
3.5.3 Theorem of Stokes

Let F (x, y, z) = f1 (x, y, z)i + f2 (x, y, z)j + f3 (x, y, z)k be a vector field
whose component functions are continuously differentiable, Σ be a smooth
surface bounded by a piecewise smooth closed curve C, n be a vector field
of unit normals to Σ whose orientation fits the orientation of C (this will be
described below). Stokes theorem relates the line integral of F over C to a
surface integral involving curlF over Σ by the formula
I Z Z
F dr = (curlF )ndσ. (3.93)
C Σ
Line integrals of vector fields have been studied in Subsection 3.4.1 and surface
integrals in Subsection 3.4.2. The integrand of the surface integral on the right
side is a scalar field, obtained as the scalar product of the vector fields curlF
and n. From Definition 27 we recall that the notation as a vector product or
as a determinant
i j k
∂ ∂ ∂
curlF = ∇ × F = ∂x ∂y ∂z (3.94)
f1 f2 f3
is a convenient way to memorize the component definition of the rotation
curlF of the vector field F .
We assume that the surface Σ is given as z = S(x, y) with a function S defined
on the corresponding domain D in the xy-plane, and we assume that S and
its first and second partial derivatives are continuous. We have seen in (3.63)
that
1
q
n = (−∂x Si − −∂y Sj + k), ν = 1 + (∂x S)2 + (∂y S)2 (3.95)
ν
defines a unit normal in the point (x, y, S(x, y)) of Σ, if the right hand side
is evaluated at (x, y) ∈ D. We fix a suitable orientation of the boundary
curve C of Σ as follows. Let τ be the boundary of D, positively oriented by a
parametrization q : [a, b] → R2 . Then
r(t) = q1 (t)i + q2 (t)j + S(q1 (t), q2 (t))k
defines a parametrization r : [a, b] → R3 of C which orients C as required.

Theorem 44. (Stokes)Under the assumptions above we have
I Z Z
F dr = (curlF )ndσ. (3.96)
C Σ
Remark 41. (i) A proof will be given in Appendix C.

(ii) The theorem also holds for more general surfaces, that is, surfaces which
cannot be described as the graph of a function S.
(iii) The theorem relates a one-dimensional integral (the line integral) to a
two-dimensional integral (the surface integral).
Vector Calculus 303
Example 212. Verify the Stokes theorem by evaluating both sides of (3.96)
for the following data.
(a) F (x, y, z) = (x − y)i + (y − z)j + (z − x)k, the surface Σ is the portion of
the plane x + y + z = 1 which lies in the first octant.
(b) F (x, y, z) = (y − z)i + (z + x)j + (y − x)k, the surface Σ is the portion of
the paraboloid z = 9 − x2 − y 2 which lies above the xy-plane.
Solution: (a) The surface Σ forms a planar triangular region in space whose
vertices are the unit vectors i, j and k and the boundary curve C is the triangle
connecting those points and thus is piecewise smooth. The corresponding do-
main D in the xy-plane is the triangular region enclosed by x+y = 1,x = 0 and
y = 0. Thus, if we go through the corners of C in the sequence i → j → k → i,
the orientation corresponds to a positive orientation of τ , the boundary of D.
We parameterize the first piece C1 : i → j with r(t) = (1 − t)i + tj, t ∈ [0, 1].
For F (x, y, z) = (x − y)i + (y − z)j + (z − x)k, the line integral becomes
Z Z 1
F dr = F (r(t))r0 (t)dt
C1 0
Z 1
= ((1 − 2t)i + tj + (t − 1)k)(−i + j)dt
0
Z 1
1
= (3t − 1)dt = .
0 2
The other two pieces of C are parameterized by r(t) = (1 − t)j + tk and
r(t) = (1 − t)k + tj, t ∈ [0, 1], respectively. An analogous computation as
above shows that Z Z
1
F.dr = F.dr =
C2 C3 2
so I
1 1 1 3
F dr = + + = .
C 2 2 2 2
The parametrization S of Σ is given by S(x, y) = 1−x−y, so ∂x S = ∂Y S = −1
is constant. The unit normal according to (3.63) is
1
n = √ (i + j + k).
3
Moreover,
p curlF = i + j +√k. Thus, both n and curlF are constant on Σ .Since
1 + (∂x S)2 + (∂y S)2 = 3, the surface integral becomes
Z Z Z Z √ Z Z √ √
3
(curlF ).ndσ = 3dσ = 3 3dA = ,
Σ Σ D 2
because the area of the triangular region D equals 12 .

(b) The boundary curve C is the circle lying in the xy-plane with radius 3 and
center at the origin and it can be parameterized as r(t) = 3 cos ti + 3 sin tj for
0 ≤ t ≤ 2Π. We have
I Z 2 Z 2
F dr = ΠF (r(t))r0 (t)dt = Π9(− sin2 t + cos2 t)dt = 0.
C 0 0
Since the line integral equals 0, it does not matter which orientation we choose.
Moreover, we compute that curlF = 0, so
Z Z Z Z
(curlF )ndσ = 0dσ = 0.
Σ Σ
Hence, we have verified the Stokes theorem for both sets of data.
Remark 42. A vector field F defined on some open set G of space R3 is
called irrotational in G if curlF = 0 at all points of G. H This terminology is
motivated by the Stokes theorem, since the circulation C F dr equals zero for
all closed curves which arise as a boundary of a surface Σ domain G. If all
closed curves in G arise as boundaries, we then can conclude from Theorem
45 that curlF = 0 in G implies that F is conservative on G. This is the case,
for example, when G is the whole space R3 or when G is a ball.
On the other hand, if curlF(x, y, z) is not zero at some point (x, y, z), then
for smallH disks around this point whose normal is parallel to curlF(x, y, z),
we have C F dr 6= 0 and hence F cannot be conservative. However, note that
for some types of regions G there may exist closed curves C which are not
boundaries of such a surface Σ and have non-zero circulation (and hence, F
is not circulation free and not conservative on G according to Definitions 44
and 37, even though we might have curlF = 0 on G). Consider, for example,
the vector field
y x
F (x, y, z) = − 2 i+ 2 j.
x + y2 x + y2
Its domainH G is the whole space R3 except the z-axis. It satisfies curlF = 0
on G, but C F dr = 2Π 6= 0 for the circle C defined by x2 + y 2 = 1. Indeed, if
Σ is a surface with boundary C, it must intersect the z-axis, so it cannot be
contained in G.
H RR
Example 213. For the following data, evaluate C F dr or Σ
(curlF ).ndσ,
whichever is easier.
(a) F = yx2 i − xy 2 j + z 2 k, the surface Σ is the hemisphere x2 + y 2 + z 2 =
4,z ≥ 0.
(b) F = zi + xj + y 2 k, the surface Σ is the cone z = x2 + y 2 for 0 ≤ z ≤ 4.
Solution: (a) The boundary curve C of Σ can be described by r(t) = 2 cos ti+
2 sin tj for 0 ≤ t ≤ 2Π, so along C we have
F (r(t))r0 (t) = −16 cos2 t sin2 t − 16 cos2 t sin2 t = −8 sin2 (2t) = −4(1 − cos 4t),
and therefore I Z 2
F dr = Π − 4(1 − cos 4t)dt = −8Π.
C 0
Vector Calculus 305
(b) The parametric equation of C is r(t) = 4 sin ti+4 cos tj+4k for 0 ≤ t ≤ 2Π,
so
F (r(t)).r02 t,
I Z 2
F.dr = Π16 cos t − 16 sin2 t = −16Π,
C 0
p
Note that the parametrization z = S(x, y) = x2 + y 2 is not differentiable at
0; we remark that the Stokes theorem remains valid in this particular case.
3.6 Applications of Vector Calculus to Engineering

Problems
A major part of science and engineering deals with the analysis of forces
such as the force of water on a dam or of air on a wing, stresses within
buildings and bridges, electric and magnetic forces in the power industry and
in computer hardware, and so on. These forces vary with position, time and
other circumstances. Vector calculus provides basic tools for analyzing and
understanding these situations.
In the first subsection we exhibit relationships between elements of vector
calculus and physical concepts like velocity, acceleration, momentum, angular
momentum and temperature. The second subsection presents applications of
line integrals. The third subsection is concerned with applications of surface
integrals, in particular with the flux of a vector field across a surface, which
leads to a better understanding of the notion of divergence. The fourth sub-
section is devoted to further applications of the divergence theorem of Gauss,
namely Archimedes’ principle, mass balance and heat flow. As an application
of the theorem of Stokes, we interpret the notion of curl in the fifth subsec-
tion. The final subsection contains some basic examples of fluid flow, with an
application to hurricane modelling.
3.6.1 Elements of Vector Calculus and Physical World

We present here some physical phenomena modeled by vector fields. Ac-
cording to Newton’s law of gravitation, two objects with masses m and M
attract each other with a force F of magnitude
GmM
k F k= (3.97)
r2
where r is the distance between the two objects (treated as point masses),
and G is the gravitational constant (G = 6.673 m3 /(kgsec2 )). Assume that
the object with mass M is located at the origin in 3-space, r is the position
vector of the object of mass m and r =k r k, and the force F (r) exerted by
the object of mass M on the object of mass m points in the direction of the
−r
unit vector krk . Thus, from (3.97)
GmM 2 r GmM 3
F (r) = − =− r, (3.98)
krk krk krk
or, in Cartesian coordinates,

3
GmM 2
F (x, y, z) = − (xi + yj + zk).
(x2 + y 2 + z 2 )
This defines a vector field whose domain D equals the whole space R3 except
the origin. It describes the gravitational force of a point mass M located at
the origin, as a function of the position of the point mass m.
Electric Force Field
Coulomb’s law states that the electrostatic force exerted by one charged par-
ticle on another is directly proportional to the product of the charges and
inversely proportional to the square of the distance between them. Let two
particles of charge Q and q be located at the origin of R3 and at the position
given by a vector r, respectively. Then the force F (r) that the particle of
charge Q exerts on the particle of charge q equals
qQ
F (r) = r, (3.99)
4Π0 k r k3
where 0 is a positive constant (called the permittivity constant or the dielec-

tric constant). Note that the force is repellent (directed outward) if Q and
q have the same sign, and attractive otherwise. Formula (3.99) defines the
vector field F of the electrostatic force generated by a the point charge Q at
the origin, as a function of the position of the point charge q. If we divide by q
we obtain the electrostatic force per unit charge, which is called the electric
field.
F
E= . (3.100)
q
Because of their form (3.97) and (3.99), both Newton’s law and Coulomb’s
law are instances of what is termed an inverse square law. Recall that, for a
scalar vector field ψ = ψ(x, y, z), the gradient is defined as
∂ψ ∂ψ ∂ψ
∇ψ = i+ j+ k
∂x ∂y ∂z
and is called a potential of the vector field F if F = ∇ψ. In this case, F is
called a gradient field.
Vector Calculus 307
Gravitational and Electric Potential

The gravitational force field (3.98) possesses the potential (called gravitational
potential)
GmM
ψ(r) = . (3.101)
krk
Indeed, we have computed in Example 1.97(a), setting n = −1 where F =
∇ψ. Analogously, the electric field (3.100) possesses the potential −ψ (called
electric potential, the minus sign is conventional) with
Q
ψ(r) = . (3.102)
4Π0 k r k
We have seen in Section 3.3 that at each point in a gradient field F where
the gradient is non-zero, the latter points in the direction in which the rate
of increase of the corresponding potential is maximal, and that moreover the
gradient is perpendicular to the tangent plane of the level surface ψ(x, y, z) = c
through that point.
Vector fields F with domain and range in the plane R2 can be represented
graphically by drawing the vectors F (r) at some points r of the plane. We
give some examples.
(i) Setting r(x, y) = xi + yj as above, we consider the vector field F (r) = r as
well as
r x y
F (r) = =p i+ p j.
krk 2
x +y 2 x + y2
2
These vector fields are shown in Figures 3.11a and 3.11b respectively.
FIGURE 3.11a: Vector Field
(ii) The vector field defined as F (r) = F (x, y) = −yi + xj is called the spin
field or rotation field or turning field. F (r) is perpendicular to r, since
F (r)r = −yx + xy = 0.
p F
Moreover, we have k F (r) k= x2 + y 2 =k r k. The vector fields F and kF k
are respectively shown in Figures 3.12a and 3.12b.
FIGURE 3.11b: Vector Field
Velocity and Acceleration Let a particle be moving along a path having

position r(t) = x(t)i + y(t)j + z(t)k as t varies from a to b. Its velocity field v
also has the domain [a, b] and is defined as
v(t) = r0 (t) = x0 (t)i + y 0 (t)j + z 0 (t)k. (3.103)
The speed of the particle is defined as the magnitude of the velocity and it
therefore is equal to
1
k v(t) k= (x0 (t)2 + y 0 (t)2 + z 0 (t)2 ) 2 =k r0 (t) k . (3.104)
Let s(t) denote the total distance traveled up to time t, that is,
Z
s(t) = k F 0 (τ ) k dτ, see (3.101)
. Since s0 (t) = kr0 (t)k, we see from (3.104) that the speed of the particle is
equal to s0 (t). The acceleration a of the particle is defined as the rate of
change of the velocity with respect to time,
a(t) = v 0 (t) = r00 (t) = x00 (t)i + y 00 (t)j + z 00 (t)k.
Momentum The momentum p of an object is defined as the mass of the

object times its velocity,
p(t) = mv(t) = mr0 (t).
By Newton’s law, its time derivative
p0 (t) = mr00 (t) = ma(t)
equals the total (or net) force acting upon the object. We see therefore that
p0 (t) = 0 if the net force is zero at time t.
Vector Calculus 309
FIGURE 3.12a: Spin Field F
Remark 43. As a consequence, the momentum p of an object stays constant

during time intervals in which no force is applied to it. This statement is called
the law of conservation of momentum. Indeed conservation laws (which assert
that certain quantities stay constant under certain conditions) are basic in-
gredients of the sciences, since in particular when analyzing situations with
several (or many) changing quantities it can be very helpful to identify quan-
tities which are not changing. Instead of saying ”quantity X stays constant”
one also says ”quantity X is invariant” or ”quantity X remains invariant”.
Angular Momentum If an object of mass m has velocity v(t) at position
r(t) at time t, its angular momentum L(t) w.r.t. the origin is defined by the
equation
L(t) = r(t) × p(t) = r(t) × mv(t) = r(t) × mr0 (t).
Note that the angular momentum L(t) is perpendicular to r(t) and v(t). Ac-
cording to (3.18), its magnitude is given by
k L(t) k=k r(t) kk mv(t) k sin θ(t),
where θ(t) is the angle between r(t) and v(t).
Example 214. A particle in the plane moves along the circle with radius 2
centered at the origin in such a way that its x- and y-coordinates are given
by x(t) = 2 cos t, y(t) = 2 sin t.
(a) Find the velocity, the speed, and the acceleration of the particle at an
arbitrary time t.
F
FIGURE 3.12b: Spin Field
kF k
(b) Sketch the path of the particle and show the position and velocity vectors
at t = Π4 .
Solution: (a) The position is described by the vector function
r(t) = 2 cos ti + 2 sin tj.
Its velocity and speed at time t are therefore
v(t) = r0 (t) = −2 sin ti + 2 cos tj,

p q
k v(t) k= (−2 sin t) + (2 cos t) = 4(sin2 t + cos2 t) = 2.
2 2
Its acceleration at time t is a(t) = r00 (t) = −2 cos ti − 2 sin tj. At time t = Π
4,
we have
Π Π Π √ √
r( ) = 2 cos i + 2 sin j = 2i + 2j,
4 4 4
Π Π Π Π √ √
v( ) = r0 ( ) = −2 sin i + 2 cos j = − 2i + 2j.
4 4 4 4
Example 215. A particle of charge q moving in a magnetic field B is subject
to the so-called Lorentz force:
q
F = v×B (3.105)
c
where c is the speed of light and v is the velocity of the particle. Assume
Vector Calculus 311
that the magnetic field is constant and vertically oriented, B(t) = B0 k with
B0 6= 0. Find the path
r(t) = x(t)i + y(t)j + z(t)k
of the particle, given its initial position r(0) = r0 and velocity v(0) = v0 , as
well as its mass m.
Solution: By Newton’s law and (3.105) we have
q
mv 0 (t) = mr00 (t) = F (t) = v(t) × B(t).
c
Since B(t) = B0 k, we get
qB0
v 0 (t) = λv(t) × k, whereλ = . (3.106)
mc
Written in components v(t) = v1 (t)i + v2 (t)j + v3 (t)k becomes
v10 (t)i + v20 (t)j + v30 (t)k = λ[v2 (t)i.v1 (t)j].
This implies that
v10 (t) = λv2 (t), v20 (t) = −λv1 (t), v30 (t) = 0. (3.107)
Since v30 (t) = 0 for all t, v3 is a constant function, say v3 (t) = C. From the
first two equations of (3.107) we get
v100 (t) = λv202 v1 (t),
or
v100 (t) + λ2 v1 (t) = o. (3.108)
A solution of (3.108) is given by
v1 (t) = Asin(λt + ϕ),
because
v10 (t) = λA cos(λt + ϕ), v1002 (t) = λA sin(λt + ϕ).
Since v10 (t) = λv2 (t), we get
v10 (t)
v2 (t) = = A cos(λt + ϕ).
λ
Therefore
r0 (t) = v(t) = A sin(λt + ϕ)i + A cos(λt + ϕ)j + Ck.
A final integration with respect to t gives us

A A
r(t) = [− cos(λt + ϕ) + K1 ]i + [ sin(λt + ϕ) + K2 ]j + [Ct + K3 ]k, (3.109)
λ λ
where K1 , K2 , K3 are constants of integration. All six constants A, ϕ, C,

K1 ,K2 and K3 can be evaluated from the six initial conditions given by the
vector equations r(0) = r0 and v(0) = v0 , but we omit this computation
here. Instead, we observe that the path of the particle is a circular helix
with axis parallel to B, that is, parallel to k. One can see this from equation
(3.109): The z-component of r varies linearly with t, while the x- and y-
components represent uniform circular motion with angular velocity λ and
radius | Aλ1 | around the center (K1 , K2 ). Thus, charged particles spiral around
the magnetic field lines. Qualitatively, this behavior still occurs even if the
magnetic field lines are curved, as in the case of the earth’s magnetic field.
Charged particles trapped by the earth’s magnetic field spiral around the
magnetic field lines that run from pole to pole.
A heat-seeking particle is located at the point (2,3) on a flat metal sheet whose
temperature at a point (x, y) is
T (x, y) = 10 − 8x2 − 2y 2 .
Example 216. Find an equation for the trajectory of the particle if it moves
continuously in the direction of maximum temperature increase.
Solution: Let the trajectory (a curve) be parameterized by r(t) = (x(t), y(t))
with r(0) = (2, 3). Since the direction of maximum temperature increase at
any point (x, y is given by ∇T (x, y), the velocity vector v(t) of the particle at
time t points in the direction of the gradient at its current position r(t). Thus
there is a scalar µ that may depend on t such that
v(t) = µ(t)∇(x(t), y(t)).
Since v(t) = r0 (t) and ∇T (x, y) = (−16x, −4y), we get
x0 (t)i + y 0 (t)j = µ(t)(−16x(t)i − 4y(t)j),
or, equating components,
x0 (t) = −16µ(t)x(t), y 0 (t) = −4µ(t)y(t). (3.110)
Let M = M (t) be any antiderivative of µ. We may check that
x(t) = e−16M (t) x0 , y(t) = e−4M (t) y0
are solutions of (3.110) with initial values x(0) = x0 and y(0) = y0 . Using the
initial values x0 = 2 and y0 = 3 we see that the trajectory (x(t), y(t)) satisfies,
for all values of t, the equation
3 √
y= √
4
4
x. (3.111)
2
The graph of the trajectory and Cantor plot of the temperature function are
shown in Figure 3.13.
Vector Calculus 313
FIGURE 3.13: Trajectory and Cantor Plot
Remark 44. The preceding example exhibits dynamics controlled by the

gradient of some scalar field. This is called gradient flow. There are numerous
applications where such a situation arises.
Example 217. Assume that a certain distribution of electric charges in the
plane produces the electric potential (x, y) = e−.2x cos(2y).
(i) Find the electric field vector E = −∇r at ( Π4 , 0).
(ii) Find the direction in which the potential decreases most rapidly at this
point.
Solution:(i) We have∇(x, y) = −2e−2x cos 2yi − 2e−2x sin 2yj,
Π Π
E(x, y) = −∇ψ( , 0) = −2e− 2 j.
4
(ii) At ( Π4 , 0), decreases most rapidly in the direction of −∇ψ( Π4 , 0) which was
computed in (i).
Suppose a rigid object rotates with constant angular speed around an axis
through the origin with direction a ∈ R3 . Let r = xi + yj + zk be any point of
space. We decompose r = r⊥ +αa into a radial component r⊥ and a component
αa parallel to a, where α is a scalar. Since any material point of the object
which passes through a fixed space point r will have the same velocity vector,
we can associate with this motion a velocity field v = v(x, y, z). We see that
v is perpendicular to r⊥ as well as to the rotation axis a and the speed k v k
is proportional to r⊥ =k r k sin θ. Therefore, there exists a unique vector ω
which is parallel to a such that
v = ω × r⊥ = ω × r.
Angular velocity as the curl of linear velocity: This vector ω is called the
angular velocity of the object; its length k ω k equals the angular speed. Let
ω = Ai + Bj + Ck. Then
v(x, y, z) = (Ai + Bj + Ck) × (xi + yj + zk)
= (Bz − Cy)i + (Cx − Az)j + (Ay − Bx)k.

We compute

i j k

curl(v) = ∂x ∂y ∂z = 2Ai + 2Bj + 2Ck
Bz − Cy Cx − Az Ay − Bx
= 2ω. (3.112)
Thus the angular velocity of a uniformly rotating body equals one-half the
curl of the linear velocity, as the latter is called in this context to emphasize
its different character.
3.6.2 Applications of Line Integrals

Line integrals have been introduced in Section 3.4.1. Here we present some
applications in the area of mechanics.
Line Integrals of Scalar Fields
Let us consider a bent wire as a (one-dimensional) smooth curve r(t) = x(t)i+
y(t)j + z(t)k in space, where r : [a, b] → R3 . We assume that the distribution
of its mass is described by a continuous density function ρ (in units of mass
per unit length) defined on the set C = r([a, b]) of the curve points. We have
already mentioned in Subsection 3.4.1 that its total mass M can be expressed
as the line integral
Z Z b
M= ρds = ρ(r(t)) k r0 (t) k dt.
C a
Its center of mass is another quantity of interest in mechanics. We define the

coordinates of the so-called first moment of ρ as the line integrals
Z Z Z
Mx = xρds, My = yρds, Mz = zρds,
C C C
that is,
Z
Mx = x(t)ρ(r(t)) k r0 (t) k dt, My = . . . , Mz = . . . .
The coordinates of the center of mass are then given by

Mx My Mz
x= , y= , z= .
M M M
Vector Calculus 315
Let us now assume that the wire rotates around an axis. Its moment of inertia
with respect to this axis is given by the line integral
Z
I= d2 ρds,
C
and d(x, y, z) denotes the distance of the point (x, y, z) from the axis. In
particular, we obtain the moment of inertia of the wire with respect to the
x-axis as Z
Ix = (y 2 + z 2 )ρds.
C
Line Integrals of Vector Fields

Let C be the straight line connecting two points r0 and r1 in space. We know
from elementary mechanics that if a mass is moved from r0 to r1 under the
influence of a constant force F , the work W done by the force is given by
W = F.(r1 − r0 ) =k F kk r1 − r0 k cos α,
where α is the angle between the vectors F and r1 − r0 . Let us now parame-
terize C as r(t) = r0 + t(r1 − r0 ) with r : [0, 1] → R3 . Since r0 (t) = r1 − r0 ,
we have Z 1
F.(r1 − r0 ) = F.r0 (t)dt.
0
The latter integral is nothing more than the line integral of Definition 70, so
we can express the work W as the line integral.
Z
W = F.dr. (3.113)
C
This line integral also yields the correct value of the total work done by an
arbitrary force field Let F (x, y, z) = f1 (x, y, z)i + f2 (x, y, z)j + f3 (x, y, z)k
on a mass which has moved along an arbitrary curve C from its initial point
r(a) to its final point r(b), if C is given by r = x(t)i + y(t)j + z(t)k with
r : [a, b] → R3 . This can be seen if we approximate C by a curve consisting
of pieces of straight lines and pass to the limit. The procedure is analogous
to the one used to compute the length of a curve, we will not carry out the
details.
Example 218. Let F (x, y, z) = i − yj + xyzk be a force field. Calculate the
work done when moving a particle from (0, 0, 0) to (1, −1, 1) along the curve
x = t, y = −t2 , z = t, 0 ≤ t ≤ 1.
R
Solution: The work done is equal to the line integral C F.dr. We compute
Z Z Z 1
F.dr = dx − ydy + xyzdz = [1x0 − yy 0 + xyzz 0 ](t)dt
C C 0
Z 1
1 1
= 1 + t2 (−2t) − t4 dt = [t − t4 − t5 ]10
0 2 5
1 1 3
=1− − =
2 5 10.
Example 219. Find the work done by F (x, y, z) = x2 i − 2yzj + zk in moving
an object along the line segment from (1,1,1) to (4,4,4).
Solution: First parameterize the line segment as x = y = z = 1 + 3t with
0 ≤ t ≤ 1, so x0 (t) = y 0 (t) = z 0 (t) = 3. The work done by F moving the object
along C is given by
Z Z 1
W = F.dr = [x2 x0 − 2yy 0 + zz 0 ](t)dt
C 0
Z 1
= [(1 + 3t)2 − 2(1 + 3t)2 + (1 + 3t)].3dt
0
(1 + 3t)2 (1 + 3t)3 1 27
[ − ]0 = − .
2 3 2
In Definition 72 we have called a vector field F conservative if it possesses
a potential function, that is, if F = ∇ψ for some scalar field. The following
remark explains this terminology.
Remark 45. (Conservation of mechanical energy)
Let an object of mass m move in space according to Newton’s law F = ma =
mr00 . Assume that the force field F has a potential function ψ , so F = ∇ψ
. In mechanics, one considers the potential energy −ψ and the kinetic energy
− 12 m k v k2 2, where v = r0 . The total mechanical energy E(t) of the object
at time t is given as the sum of its kinetic and its potential energy at that
time, so
1
E(t) = mr0 (t).r0 (t) − ψ(r(t)). (3.114)
2
Let us compute its time derivative E 0 (t). Using the product rule and the chain
rule we get
1 00 1
E 0 (t) = mr (t).r0 (t) + mr0 (t) − ∇ψ(r(t)).r0 (t)
2 2
= [mr00 (t) − ∇ψ(r(t))].r0 (t).
Since mr00 (t) = F (r(t)) = ∇(r(t)), we conclude that E 0 (t) = 0. This means
that the total energy remains constant (or, is conserved) along the trajectory
of the object.
Remark 46. As we have seen in Section 3.6.1, the gravitational field pos-
sesses a potential and hence it is conservative.
Vector Calculus 317
3.6.3 Applications of Surface Integrals

Mass on Surface
Let Σ be a piecewise smooth bounded surface. We already know from Subsec-
tion 3.4.2 that its area is given by the surface integral of the constant 1 over
Σ, Z Z
A(Σ) = 1dσ.
Σ
Assume now that the surface carries mass whose distribution is described by
a continuous density function ρ = ρ(x, y, z) (in units of mass per unit area)
defined on Σ. The total mass carried by the surface is then given by
Z Z
M= ρdσ.
Σ
Flux of a Vector Field across Surface Consider the motion of a fluid (a

liquid or a gas) through a pipe with rectangular cross-section Σ0 of area A0 .
Let Σ be a (hypothetical) rectangular surface obtained by intersecting the
pipe with a plane tilted by an angle α, see Figure 3.14.
FIGURE 3.14: Flux
Assume for the moment that the fluid moves with constant velocity vector v
during a certain time interval [0, T ]. At time T , those particles of the fluid
which have passed through Σ during the interval [0, T ] occupy the region D.
The volume of D is given by
vol(D) =k v k T A0 =k v k T A cos α = T Av.n, (3.115)

where n is a unit normal to Σ, and A is the area of Σ. Thus, the number Av.n
gives the rate (per unit time) of fluid volume passing through Σ. The sign of
this number depends on the direction of the flow (which is the direction of the
velocity vector v) as well as on the orientation of the normal n.
In general, the velocity field v = v(x, y, z, t) is not constant, but depends on
space and time, and the surface Σ does not lie in a plane. If Σ, however, consists
of plane parts Σj with unit normals nj and area Aj on which, moreover, v
is a constant vector v j . By the above considerations the rate of fluid volume
passing through all of Σ is equal to Σj Aj v j .nj . For the general case, the surface
integral can be viewed as the result of a limit procedure of approximating Σ
and v.n in such a piecewise fashion. Thus, the rate of fluid volume passing
through Σ at time t is equal to
Z Z
v(., t).ndσ. (3.116)
Σ
RR
Here, the notation v(., t) indicates that the surface integral Σ
f dσ has to be
evaluated with the integrand f (x, y, z) = v(x, y, z, t).n(x, y, z) for that value
of the time t for which we want to determine the rate. The integral in (3.116)
is called the flux of the vector field v across the surface Σ. In this context,
the vector field v is called the corresponding flux density. Since the rate of a
quantity is nothing else than its time derivative, we obtain the total volume
of the fluid passing through Σ during some time interval [t1 , t2 ] as the integral
of (3.116) with respect to time,
Z t2 Z Z
v(., t).ndσdt. (3.117)
t1 Σ
If we want to determine the rate of fluid mass (instead of volume) passing

through Σ, we have to replace the vector field v by the vector field ρv, where
ρ is the mass density (mass per unit volume). Indeed, in the situation described
above by Figure 3.14, the total mass of the fluid occupying the region D equals
ρvol(D) = T Aρv.n, and the subsequent considerations yield the rate of fluid
mass passing through Σ as
Z Z
ρ(., t)v(., t).ndσ. (3.118)
Σ
Note that (3.118) also applies to situations where ρ is not constant, that is, if
the fluid in question is compressible. This typically occurs when the fluid is a
gas.
Accordingly, for an arbitrary vector field F = F (x, y, z, t) with values in R3
one defines the flux of F across the surface Σ as the surface integral
Z Z Z Z
F (., t).ndσ, or F.ndσ. (3.119)
Σ Σ
The latter if F is stationary (that is, F does not depend on t) or if one just
wishes to keep the notation simple. In this context, the vector field F useful
Vector Calculus 319
is called the flux density.

Let the surface Σ enclose a region D in space, and let n be the field of
outer unit normals on Σ. At points of Σ where F.n > 0(< 0), the field vector
F points outward (inward). Thus in this case, the (3.119) yields the outward
flux across Σ.
Interpretation of Divergence RR
We have just seen that the surface integral Σ
F.ndσ over the boundary Σ
of a region D in space can be understood as the outward flux of the vector field
F . In combination with the divergence theorem, this yields an interpretation
of the divergence divF at a point P = (x, y, z) as follows. Assume that F is
continuously differentiable and let D be any region which includes P . If D is
small, then divF (x, y, z) is close to the average value of divF over D,
Z Z Z
1
divF (x, y, z) ≈ divF dV. (3.120)
volD D
Actually, when divF is continuous as we have assumed, we can prove that this
average value tends to divF (x, y, z) in the limit when D shrinks to the single
point P . From the divergence theorem (39) we obtain
Z Z Z Z Z
1 1
divF dV = F.ndσ, (3.121)
vol(D) D vol(D) Σ
where Σ is the boundary of D. Putting together (3.120) and (3.121) we

see that the number divF (x, y, z) yields the outward flux per unit volume
near the point P = (x, y, z), in the limit of small volumes. In particular, if
divF (x, y, z) < 0 the quantity described by F tends to accumulate near P ,
whereas if divF (x, y, z) < 0 the quantity tends to move away from P . In the
first case P is sometimes called a sink, and in the second case it is called a
source.
It is instructive to look at this situation in more detail. Let D be a rect-
angular box with P as one of its corner points, see Figure 3.15. Let us look at
the front face II and the back face I in the figure. On face II, the unit outer
normal is constant and equal to the vector i. Consequently, the outward flux
across face II is
Z Z Z Z Z Z
IF.ndσ = IF.idσ = If1 dσ,
I I I
where F = f1 i + f2 j + f3 k. On face I, the unit outer normal is −i, so the sign

is reversed, and we obtain
Z Z Z Z Z Z Z Z
IF.ndσ + F.ndσ = If1 dσ − f1 dσ. (3.122)
I I I I
For the other four faces we get corresponding expressions involving the com-
ponents f2 and f3 of F . Now let us apply the divergence theorem to the vector
z Back face I
(x, y, z)
(x + !u, y, z)
FIGURE 3.15: Interpretation of Divergence Theorem
∂f1
field F̃ = f1 i, instead of F . We have div F̃ = and therefore
∂x
Z Z Z Z Z Z Z Z Z Z Z Z
∂f1
dV = div F̃ dV = F̃ .ndV = If1 dσ− f1 dσ
D ∂x D Σ I I
(3.123)
since the contributions of the other four faces are zero for F̃ . Therefore,
Z Z Z Z Z Z Z
∂f1 1 ∂f1 1
(x, y, z) ≈ dV = [ If1 dσ − f1 dσ]
∂x volD D ∂x volD I I
(3.124)
∂f1
for small values of ∆x, ∆y and ∆z. Thus, the number expresses the
∂x
difference of outward flux across face II and inward flux across face I, per
∂f2 ∂f3
unit volume. The partial derivatives and can be interpreted in an
∂y ∂z
analogous manner.
Vector Calculus 321
3.6.4 Applications of Gauss Divergence Theorem

Archimedes Principle
We use the divergence theorem to prove the validity of the Archimedes prin-
ciple. This principle states that the buoyant force, exerted by a fluid on a
solid object immersed in it, is equal to the weight of the fluid displaced by the
object. This forms the basis for a ship to float or sink in the ocean. The issue
is the balance between the force of gravity and the buoyant force. Consider
a solid object immersed in a fluid which occupies a region D bounded by a
piecewise smooth surface Σ with unit outer normal field n. Assume that the
fluid is at rest, has constant density ρ and is acted upon by gravity in the
downward direction, with constant gravity acceleration g. In mechanics, as a
consequence of the balance of momentum, the pressure p = p(x, y, z) within
the fluid satisfies
p(x, y, z) = p0 .ρgz, (3.125)
if p has a constant value p0 in the horizontal plane z = 0 (for example, the
atmospheric pressure at the ocean surface). We see that the pressure increases
linearly with depth −z (the vertical coordinate is directed upward). The total
buoyant force on the object is given by the vertical component f3tot of the total
force F tot = f1tot i + f2tot j + f3tot k exerted by the fluid on the object. (This is a
single vector, not a vector field.) On the surface Σ, the fluid pressure acts as
a force density (force per unit area) of magnitude p and direction −n. Since
−pn.k is the vertical component of this force density, the total buoyant force
is obtained as the surface integral
Z Z Z Z
tot
f3 = −pn.kdσ = (ρgz − p0 )n.kdσ. (3.126)
Σ Σ
As an illustration, if Σ0 would be a flat horizontal piece, having area A0 , of the

lower part of Σ at depth d = −z (so the force acts from below with n = −k),
the buoyant force on this piece would be equal to (ρgd + p0 )A0 .
In order to apply the divergence theorem to (3.126), we define the vector
field G = −pk. From (3.125) we see that
∂p
(divG)(x, y, z) = − (x, y, z) = ρg (3.127)
∂z
is constant, so the total buoyant force becomes
Z Z Z Z Z Z
f3tot = −pn.kdσ = −pk.ndσ = G.ndσ
Σ Σ Σ
Z Z Z Z Z Z
(divG)dV = ρgdV = ρgvol(D). (3.128)
D D
Since m = ρvol(D) is the mass of the fluid displaced by the object, its weight
mg (the force exerted by gravity on the fluid mass) equals the buoyant force
acting on the object. In the same manner, one can show that f1tot = f2tot = 0,
that is, the total force on the object is indeed vertically upward. To see this,
one replaces k by i or j in the derivation prior to (3.127). Note that the field
∂p ∂p
G in this case satisfies divG = 0, because ∂x = ∂y = 0. Thus we have derived
the principle of Archimedes from the balance of momentum which underlies
Equation (3.125) for the pressure.
Mass Balance
Consider a region G through which a fluid moves which has mass density
ρ = ρ(x, y, z, t) and velocity v = v(x, y, z, t). Let D be any subregion of G
with boundary surface Σ and unit outer normal field n. At any given time
t, the total mass M (t) contained in D is given by the volume integral of the
mass density with respect to the spatial coordinates (x,y,z),
Z Z Z
M (t) = ρ(., t)dV. (3.129)
D
On the other hand, when discussing the flux of a vector field in Subsection
3.6.3, we have seen in (3.117) that the rate of fluid mass passing through the
surface Σ in the outward direction is given by
Z Z
ρ(., t)v(., t).ndσ. (3.130)
Σ
In continuum mechanics, the principle of mass conservation states that the

rate of change M 0 (t) of the total mass M (t) in D equals the rate of mass
passing through the boundary of D (that is, the only way for the total mass
to change is that part of it leaves or enters D through the boundary). In view
of (3.129) and (3.130) this means that
Z Z Z Z Z
d
ρ(., t)dV = M 0 (t) = − ρ(., t)v(., t).ndσ (3.131)
dt D Σ
This is the equation of mass balance. If we interchange the time derivative

with the volume integral but it can also be done in several dimensions) and
(3.131) becomes
Z Z Z Z Z
∂ρ
(., t)dV = − ρ(., t)v(., t).ndσ. (3.132)
Σ ∂t D
By the Gauss divergence theorem,

Z Z Z Z Z
(ρv)(., t).ndσ = div(ρv)(., t)dV. (3.133)
Σ D
We combine the two previous equations and merge all components under one
integral sign to obtain
Z Z Z
∂ρ
(., t) + div(ρv)(., t)dV = 0. (3.134)
D ∂t
Vector Calculus 323
This is valid for an arbitrary subregion D of G for which the divergence

theorem can be applied. A theorem of analysis (which we do not discuss in
this book) says that this is only possible if the integrand is equal to the zero
function, that is, if
∂ρ
+ div(ρv) = 0 (3.135)
∂t
holds at all points (x, y, z) of G and for all times t considered. If the density
∂ρ
ρ is constant, we have = 0 and div(ρv) = ρdivv, so (3.135) becomes
∂t
divv = 0, (3.136)
that is, the velocity field is divergence free.

Equation (3.135), called the continuity equation, is one of the fundamental
balance equations of continuum mechanics, used explicitly or implicitly in al-
most all computer simulations of the behaviors of fluids.
Heat Flow
In this subsection we show that the heat equation in Example 8.4.15, can be
obtained as a consequence of the principle of energy conservation, using the
divergence theorem. The derivation proceeds along similar lines as in the pre-
vious subsection, so our exposition will be shorter. The total amount of heat
contained in some region D at a time t is given by
Z Z Z
Q(t) = (ρu)(., t)dV, (3.137)
D
where u denotes the specific inner energy (that is, energy per mass). Denoting
further its flow by q, energy conservation tells us that
Z Z Z Z Z
d
(ρu)(., t)dV = Q0 (t) = − q(., t).ndσ. (3.138)
dt D Σ
The divergence theorem yields

Z Z Z Z Z
q(., t).ndσ = (divq)(., t)dV.
Σ D
As in the preceding subsection we can show that the previous two equations
imply that
∂
(ρu) + divq = 0 (3.139)
∂t
holds at all points (x, y, z) and times t considered. We assume that u = cT +
α, where T is the temperature, and the specific heat capacity c as well as
the number α and the mass density ρ are constant. Fourier’s law of heat
conduction states that
q = −κ∇T, (3.140)
where κ is a constant, called the heat conductivity. Since div(∇T ) = 4T , the

Laplace operator, we obtain from (3.139)
∂T κ
= ∆T (3.141)
∂t cρ
is the heat equation. The steady state case occurs when T does not change
with time. In this case ∂T
∂t = 0 and (3.141) becomes the Laplace equation
4T = 0. (3.142)
Both preceding equations are bases for the modelling and simulation of heat
conduction and flow in many kinds of engineering problems.
3.6.5 Application of Stokes Theorem

As an application of Stokes theorem, we present an interpretation of the
curl of a vector field by measuring its rotation. Let v = v(x, y, z) be the
velocity field of a fluid and P0 be any point in the fluid. Suppose Σr is a disk
of radius r with center P0 , with unit normal vector n and boundary circle Cr
as indicated in Figure 3.16. Because the disk is flat, the normal vector n is
constant on Σr . By Stokes theorem
I Z Z
v.dr = (curlv).ndσ. (3.143)
Cr Σr
If the parametrization r of Cr is chosen such that k r0 (t) k= 1 along the

0
curve, then v(r(t)).r
H (t) is the tangential component of the velocity, and the
line integral v.dr gives the circulation of the fluid around Cr . Assuming v
to be continuously differentiable for small r we have
Z Z
1
(curlv)(P0 ).n ≈ (curlv).ndσ,
Πr2 Σr
since the right hand side represents the average of (curlv).n over Σr , and the
area of Σr equals Πr2 . In fact, since curlv is continuous and Σr shrinks to the
point P0 as r → 0,
Z Z
1
(curlv)(P0 ).n = lim 2 (curlv).ndσ. (3.144)
Πr Σr
Putting (3.143) and (3.144) together we arrive at

I
1
(curlv)(P0 ).n = lim 2 v.dr. (3.145)
Πr Cr
Thus, the number (curlv)(P0 ).n represents the circulation of v per unit area
in the plane normal to n near P0 . In this manner, the curl of the velocity
serves as a local model for the rotation of the fluid.
Vector Calculus 325
Boundary circle Cr
Disk 2: r
}----------~ y
FIGURE 3.16: Interpretation of Curl
3.6.6 Example of Planar Fluid Flow

We begin with two basic situations, sink flow and vortex flow, in a situation
made as simple as possible. Consider an incompressible fluid (its density ρ is
constant) whose flow takes place in the plane and is stationary, that is, its
velocity field v = v(x, y) does not depend on time. It is also assumed that the
fluid is inviscid, that is, its internal friction (the viscosity) is zero.
Sink flow. Imagine a hole at the origin (the sink) where the fluid leaves the
plane and that (i) the fluid flows toward the origin, that is, the velocity vector
at every point (x, y) is directed toward the origin, (ii) the flow is radially
symmetric, that is, the speed of the fluid is the same at all points of every
given circle centered at the origin.
Conditions (i) and (ii) above are modeled by a velocity field of the form
p
v(x, y) = β(r)(xi + yj), r = x2 + y 2 , (3.146)
with a function β = β(r) < 0 yet to be fixed. We have seen in Section 3.6.4 that
we must have divv = 0 in order to satisfy the principle of mass conservation.
Using the chain rule we compute
∂v1 ∂r x2
= βr + xβ 0 (r) = β(r) + β 0 (r) p .
∂x ∂x x2 + y 2
In the same manner we obtain

∂v2 y2
= βr + β 0 (r) p .
∂y x2 + y 2
Since in two dimensions

∂v1 ∂v2
divv = + ,
∂x ∂y
we get
x2 + y 2
divv = 2β(r) + β 0 (r) p = 2β(r) + rβ 0 (r).
x2 + y 2
The condition
divv = 0
leads to
c
β(r) =
r2
q
for some constant c which must be negative by the above. Setting c = − 2Π
(q is called the sink strength), (3.147) becomes
q
v(x, y) = − (xi + yj). (3.147)
2Π(x2 + y 2 )
q
Since it follows that k v(x, y) k= 2Πr , the sink flow has the further property
that (iii) the speed of the fluid at any point P = (x, y) is inversely proportional
to the distance of P from the origin; in particular, it tends to +∞ as that
distance tends to zero.
Vortex flow. Here the fluid flows along concentric circles around the origin
in the counterclockwise direction, that is,
(i) The velocity vector v(x, y) at a point (x, y) is tangent to the circle cen-
tered at the origin which passes through (x, y),
(ii) v(x, y) points in the counterclockwise direction.
(iii) Moreover, the speed k v k is constant along those circles, and
(iv) The speed is inversely proportional along any such circle to the radius r
of the latter, and hence it tends to +1 as r tends to 0. The vector field
k
v(x, y) = (−yi + xj) (3.148)
2Π(x2 + y 2 )
possesses those four properties. (The constant k > 0 is called the vortex
strength.) Indeed, by (3.148) we have k v(x, y) k= 2Πxk2 +y2 = 2Πr k
.
Moreover, v(x, y).(xi + yj) = 0, so v(x, y) is perpendicular to the radius
vector of the circle, and by drawing a picture one sees that the direction
is counterclockwise.
Vector Calculus 327
Streamlines and stream functions. The paths followed by the fluid par-
ticles in a fluid flow are called the streamlines of the flow. If the streamlines
can be represented as the level curves of some function ψ = ψ(x, y), then ψ is
called the stream function of the flow. In this case, every particle path must
satisfy (x(t), y(t)) = c for a suitable constant c. By the chain rule,
∂ψ ∂ψ
(x(t), y(t))xṫ + (x(t), y(t))y ṫ = 0
∂x ∂y
must hold along particle paths. Therefore, ψ and the velocity field v of the
flow are related by
∇ψ(x, y).v(x, y) = 0 (3.149)
at all points (x, y) in the domain of the flow; the velocity vectors are tangent,
while the vectors ∇ψ(x, y) are perpendicular to the streamlines.
Combined sink and vortex flow. Here we have the velocity field
1
v(x, y) = p [(−qx − ky)i + (−qy + kx)j]. (3.150)
2Π x2 + y 2
The particles in this flow rotate while moving toward the sink, so we expect
that they spiral inward. In order to find the streamlines, we will compute its
stream function, using (3.149). It is convenient to do this in polar coordinates,
using the vectors introduced in (3.60) as
er = cos θi + sin θj, eθ = − sin θi + cos θj.
Setting x = r cos θ, y = r sin θ, the velocity field in polar coordinates becomes

1
v(r, θ) = [(−qr cos θ − kr sin θ)i] + (−rr sin θ + kr cos θ)i]
2Πr2
and ∇ψ transforms according to (3.61) into
∂ψ 1 ∂ψ
∇ψ = er + eθ .
∂r r ∂θ
Condition (3.149) for the stream function now becomes in polar coordinates
∂ψ 1 ∂ψ 1 ∂ψ k 1 ∂ψ
0=( (r, θ)er + eθ ).v(r, θ) = (−q (r, θ) + ).
∂r r ∂θ(r, θ) 2Πr ∂r r r ∂θ(r, θ)
(3.151)
We see that (3.151) is satisfied if we choose ψ such that
∂ψ k ∂ψ
(r, θ) = , (r, θ) = q. (3.152)
∂r r ∂θ
This is indeed possible and we set
ψ(r, θ) = k ln r + qθ. (3.153)

We want to compute r as a function of θ for the streamlines ψ = c. From

k ln r + qθ = c we get
1 c−qθ c −qθ
ln r = (c − qθ), r = e k = e k .e k .
k
c
Since c is an arbitrary constant, we may replace e k by c and finally obtain
that
−qθ
r = ce k , c > 0 (3.154)
holds along the streamlines. Thus, the spirals are determined by the value of
q
k.
Modelling of hurricane. Let us assume that the preceding flow model can
be used for a hurricane and that only a single measurement of velocity is
available.
Example 220. Find the strength of the parameters k and q of the flow
model (3.150) for a hurricane from the report that at 20 km distance from
the eye the wind velocity has a component of 15 km/hr towards the eye and
a counterclockwise tangential component of 45 km/hr. Estimate the size of
the hurricane by finding a radius beyond which the wind speed is less than 5
km/hr.
Solution: The velocity component toward the eye is equal to the speed of the
sink part of the flow. We have already seen that the latter is given by q/(2Πr),
so at r = 20 we have
q
15 = , so q = 600Π
2Π.20
with unit 1/hr. The tangential velocity component is equal to the speed of
the vortex part of the flow, and we have already seen that the latter is given
by k/(2Πr), so at r = 20 we have
q
45 = , so q = 1800Π,
2Π.20
again with unit 1/hr. To estimate the size, we determine r from the condition
that k v k= 5km/h. Since tangential and inward velocity are perpendicular,
we have r
1 q k
5 =k v k= ( )2 + ( )2
r 2Π 2
Therefore
1p 2 100 √ √
r= 300 + 9002 = 9 + 81 = 60 10 ≈ 189.7 km.
5 5
Vector Calculus 329
3.7 Exercises
3.1. Let x, y, z be vectors of three-dimensional space, let λ be a scalar. Show
that
(i) (x + y) + z = x + (y + z)
(ii) λ(x + y) = λx + λy
(iii) x.(y + z) = x.y + x.z
(iv) λ(x.y) = (λx).y = x.(λy)

(v) x.(x × y) = 0, that is, x × y is perpendicular to x.
(vi) y > (x × y) = 0, that is, x × y is perpendicular to y.
(vii) x × (y + z) = x × y + x × z
(viii) x × (y × z) = (x.z)y − (x.y)z
(ix) x.(y × z) = y.(z × x) = z.(x × y)
(x) x×x=0
3.2. The goal of this exercise is to derive the identity
(x × y).(z × w) = (x.z)(y.w) − (y.z)(x.w), .....(∗)
for arbitrary vectors x, y, z and w of three-dimensional space.
(i) Show that (∗) holds for x = y = k and arbitrary vectors z and w.
(ii) Show that (∗) holds for x = k, y = αi + βj and arbitrary vectors z

and w, where α and β are arbitrary scalars.
(iii) Show that (∗) holds for x = k and arbitrary vectors y, z and w.
(iv) Show that (∗) holds for for arbitrary vectors x, y, z and w.
3.3. The line in R2 that passes through the point r0 = (x0 , y0 ) and is parallel
to the non-zero vector v = (a, b) = ai + bj has parametric equations
x(t) = x0 + at, y(t) = y0 + bt,
or, in vector form r(t) = (x(t), y(t)),
r(t) = r0 + tv.
In R3 , the vector form is the same, but now r(t) = (x(t), y(t), z(t)), r0 =
(x0 , y0 , z0 ) and v = (a, b, c) and the component equations are
x(t) = x0 + at, y(t) = y0 + bt, z(t) = z0 + ct.
(a) Find the parametric equations of the line
(i) passing through (4 2) and parallel to v = (−1, 5),

(ii) passing through (1,2,-3) and parallel to v = (4, 5, −7).
(b) Find parametric equations for the line whose vector equation is given
as
(i) r(t) = 2i − 3j + ti − 4tj,

(ii) r(t) = −i + 0j + 2k − ti + 3tj.
3.4. The equation of the plane passing through a point r0 = (x0 , y0 , z0 ) in
R3 and perpendicular to a vector N = (a, b, c), called a normal for the
plane, is given by
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0,
that is, the plane consists of the points r = (x, y, z) satisfying this equa-
tion. Its vector form is
N.(r − r0 ) = 0.
Find an equation of the plane that passes through the point r0 = (2, 6, 1)
having the vector N = (1, 4, 2) as a normal.
3.5. Let F and G be two vector fields defined on an interval with values in
R3 . Prove that
d dG dF
(a) [F.G] = F. + .G.
dt dt dt
d dG dF
(b) [F × G] = F × + × G.
dt dt dt
(c) Let r = r(t) be a vector valued function with values in R3 such that
k r(t) k= c for some constant c. Show that r(t).r0 (t) = r(t). dr
dt (t) = 0,
dr 0
that is, r(t) is orthogonal to (t) = r (t), for all t.
dt
3.6. (a) Verify explicitly that div(curlF) = 0, where

(i) F = sinh(x − z)i + 2yj + (z − y 2 )k,
(ii) F = x2 i + y 2 j + z 2 k.
(b) Verify explicitly that curl(∇ϕ) = 0, where
(i) ϕ(x, y, z) = −2x3 yz 2 ,
(ii) ϕ(x, y, z) = ex+y+z .
3.7. (a) Let F and G be two vector fields with domain in R3 and values in
R3 . Prove that
div(F × G) = G.(∇ × F ) − F.(∇ × G).

Vector Calculus 331
(b) Let ϕ = ϕ(x, y, z) and ψ = ψ(x, y, z) be scalar fields. Prove that

div(∇ϕ × ∇ψ) = 0.
3.8. Let C be a constant vector and r(x, y, z) = xi + yj + zk. Prove that

(a) ∇(C.r) = C,
(b) div(r − C) = 3,
(c) curl(r − C) = 0.
R
3.9. Evaluate C F.dr, where F (x, y, z) = i − xj + k and C is parameterized
by r(t) = cos ti − sin tj + tk for 0 ≤ t ≤ Π.
RR
3.10. Evaluate the surface integral Σ
f dσ, where
p
(a) f (x, y, z) = y 2 , Σ is the part of the cone z = x2 + y 2 lying in the
first octant and between the plane z = 2 and z = 4.
(b) f (x, y, z) = xyz, Σ is the part of the surface z = 1 + y 2 for 0 ≤ x ≤
1, 0 ≤ y ≤ 1.
3.11. A particle moves once counterclockwise around the circle of radius 12
centered at the origin under the influence of the force F (x, y, z) = (ex −
3
y + x cosh x)i + (y 2 + x)j. Calculate the work done.
3.12. RApply the Green-Ostrogradski theorem to evaluate the line integrals
C
F.dr for the following data. The curves are oriented counterclockwise.
(a) F = x2 yi−xy 2 j, C is the boundary of the region defined by x2 +y 2 ≤
4, x ≥ 0 and y ≥ 0.
(b) F = (esin x − y)i + (sinh(y 3 ) − 4x)j, C is the circle of radius 4 with
center (−8, 0).
(c) F = (x2 + y 2 )i + (x2 − y 2 )j, C is the ellipse 4x2 + y 2 = 10.
RR
3.13. RLet
R RD be a region bounded by the surface Σ. Evaluate Σ
F.ndσ or
D
div(F )dV for the following data, whichever is more convenient.
(a) F = 4xi − 6yj + k, Σ is the surface of the solid cylinder x2 + y 2 ≤ 4,
0 ≤ z ≤ 6 (the surface includes both caps of the cylinder).
(b) F = 2yzi − 4xzj + xyk, Σis the sphere of radius 6 with center
(−1, 3, 1).
F.ndσ for the data F = 3xyi + z 2 k, Σ the sphere
RR
3.14. Find the value of Σ
of radius 1 centered at the origin.
3.15. Let Σ be a smooth Rsurface
R enclosing some region D and C be a constant
vector. Show that Σ
C.ndσ = 0.
RR
3.16. Evaluate Σ
(curlF ).ndσ, where F = xyi + yzj + xyk and Σ is the part
of the plane 2x + 4y + z = 8 in the first octant.
3.17. Calculate the circulation of F = (x − y)i + x2 yj + xzak counterclockwise
around the circle x2 + y 2 = 1, where a is a positive constant.
3.18. Examine whether the following vector fields are conservative or not.
(a) F = cosh(x + y)(i + j − k).
(b) F = 2xi − 2yj + 2zk.
(c) F = i − 2j + k.
3.19. Let Σ be the portion of the paraboloid z = 1 − x2 − y 2 for which z ≥ 0,
and let C be the circle x2 + y 2 = 1 that forms the boundary of Σ. Verify
Stokes’theorem for the vector field
F = (x2 y − z 2 )i + (y 3 − x)j + (2x + 3z − 1)k
by evaluating the line integral as well as the surface integral.

As we know vector calculus is the backbone of two fundamental equations,
one set governing electrodynamics named Maxwell equations and other related
to fluid dynamics called Navier-Stokes equations which are expressed in terms
of gradient, Laplacian, curl and divergence terms. Thus everything involving
electricity, magnetism and fluid (or air) flow derives from gradient, divergence,
curl and Laplacian operators. Different vector and differentiation operators
are also encountered in many different Hamiltonians for Schrodinger wave
equations.
A more practical and useful situations are antenna/scattering problems
where we are required to find the vector and scalar potential due to a current
distribution and its application to find electromagnetic fields of an object.
Concepts of grad, curl, div, are required to determine fields from the poten-
tials.
Problems involving surface or volume integrals can be transformed to a
lower dimensional integral using the results of this chapter. Around 150 years
ago James Clerk Maxwell first wrote what are now known as the four Maxwell
equations which explain magnetic and electrical fields are essentially two sides
of the same coin. The equations contain the mechanism by which electromag-
netic radiation exists how all light, all X-rays, microwaves, radio waves and
gamma rays arise as solutions to Maxwell equations.
Maxwell Equations
∇.D = ρv
∇.B = 0
∂B
∇×E =−
∂t
Vector Calculus 333
∂D
∇×H = +J
∂t
where ρ, v, J, H, B, E, D respectively represent electric charge density, elec-
tric current Π density, magnetizing field, magnetic field, electric field, and
displacement field. Monk [4], Neunzert and Siddiqi [5], Li and Huang [3],
and references therein provide interesting reading material on Maxwell equa-
tions and applications. The Navier-Stokes equations named after Claude-Louis
Navier and George Gabriel Stokes describe the motions of fluid substances.
The vector form of Navier-Stokes equations for incompressible flow of Newton
fluids is:
∂v
ρ( + v.∇v) = −∇p + µ∇2 v + f
∂t
where f represents other body forces such as gravity and ρ, µ, v, p, respectively
denote density, viscosity, velocity, and pressure. Navier-Stokes equations are
very important as they describe many physical problems. They may be used
to model the weather, ocean current, water flow in a pipe and air flow around
a wing. They help to design aircraft and cars, the studies of blood flow, the de-
sign of power stations, analysis of pollution. Coupled with Maxwell equations,
they can be used to model and study magnetohydrodynamics.
The Navier-Stokes equations pose challenging problems purely in mathe-
matical sense, as shown by a U.S.$1 million prize offered by the Clay Math-
ematics Institute for proving or disproving existence of smooth solution in
three dimensions. Books by Doering and Gibbon [1], Galdi [2], and Temam
[6] provide detailed account; interested readers should consult them to gain
deeper knowledge about the Navier-Stokes equations.
Bibliography
[1] C. R. Doering, J. D. Gibbon, Applied Analysis of the Navier-Stokes Equa-

tions, Cambridge University Press, 1995.
[2] G. P. Galdi, An Introduction to Mathematical Theory of the Navier Stokes

Equations, Second Edition, 2011, Springer.
[3] J. Li, Y. Huang, Time-Domain Finite Element Methods for Maxwell
Equations in Metamaterials, Series in Computational Mathematics,
Springer, 2013.
[4] P. Monk, Finite Element Methods for Maxwell Equations, Oxford Science
Publications, 2004.
[5] H. Neunzert, A.H. Siddiqi, Topics in Industrial Mathematics: Case Stud-
ies, Kluwer Academic Publisher, 2000. Springer electronic version 2014.
[6] R. Temam, Navier-Stokes Equations: Theory and Numerical Analysis,

AMS Chelsea Publishing, 2000.
[7] A. H. Siddiqi, P. Manchanda, M. Brokate, Calculus with Applications, IK
International, 2011.
[8] D. G. Zill and W. S. Wright, Advanced Engineering Mathematics, Jones

and Bartlett Publications, 2012.
335
Chapter 4
Fourier Methods and Integral
Transforms
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

4.2 Orthonormal Systems and Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . 338
4.2.1 Orthonormal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
4.2.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
4.2.3 Further Properties of Fourier Series . . . . . . . . . . . . . . . . . . . . . 357
4.3 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
4.3.1 Basic Properties of Fourier Transform . . . . . . . . . . . . . . . . . . 367
4.3.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
4.3.3 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
4.4 Integral Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
4.5 Sturm-Liouville Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
4.5.1 Regular Sturm-Liouville Problems . . . . . . . . . . . . . . . . . . . . . . 378
4.6 Application of Fourier Methods to Signal Analysis . . . . . . . . . . . . . . 381
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
4.1 Introduction
Fourier methods or Fourier analysis constitute a branch of mathematics
developed formally some 150 years after Newton’s and Leibniz’ calculus and
heavily depend on integral and differential calculus. Jean Baptiste Joseph
Fourier was born in 1768 in Auxerre, a town between Paris and Dijon. He
became fascinated by mathematics at the age of 13. After the French Rev-
olution, Fourier taught in Paris, then accompanied Napoleon to Egypt and
served as permanent secretary of the Institute of Egypt. He wrote a book on
Egypt and in certain quarters he is famous more as an Egyptologist than for
his contributions to mathematics and physics. In the world of science he is
famous for, among other things, the ideas he set forth in a memoir in 1807
and published in 1822 in his book in French entitled The Analytic Theory of
Heat.
Fourier analysis shows that we can represent periodic functions, even very
jagged and irregular-looking ones, in form of a finite or infinite sum of sine
337
and cosine functions, called Fourier series. Non-periodic functions can be

treated with the Fourier transform. Fourier showed how these mathemat-
ical tools can be used to study natural phenomena such as heat diffusion,
making it possible to solve equations that had until then remained intractable.
Under the action of the Fourier transform, derivatives are transformed into
multiplications, thus turning differential equations into equations containing
algebraic expressions. In this way, many important differential equations are
transformed into equations which are much easier to study and solve.
If a phenomenon is described by a function of time or space, Fourier anal-
ysis tells us, loosely speaking, how much of each frequency this phenomenon
contains. In many cases, this frequency information is not simply a mathemat-
ical trick to make calculations easier, but corresponds to relevant properties
of the phenomenon under study.
During the second half of the 20th century, Fourier analysis was refined
in various directions in order to make it easier and more efficient to transmit,
compress and analyze information or to separate information from surround-
ing noise. One of those techniques, the wavelet transform invented during the
1980s, will be presented in Chapter 10.
Fourier analysis and its descendants have been applied to a broad range of
areas of science and engineering, for example, telecommunications and signal
processing, physics, imaging in the biomedical sciences (EEG, ECG, CAT,
MRI, NMR), meteorology, oceanography, seismology, economics and finance.
A few of them will be touched in this chapter. Interested readers may find
details of some of these applications in Prestini [4] and Cartwright [2]. There
are also many interactions within mathematics in areas as diverse as statistics
and number theory.
For several reasons, we have chosen not to present proofs (detailed or
sketched) for the majority of results of this chapter, and we refer the reader
to the literature on the subject.
Finally, we want to acknowledge the contributions of Dr. A. K. Verma to
this chapter, whose program we used to illustrate the partial sums of several
of the Fourier series discussed in this Chapter.
4.2 Orthonormal Systems and Fourier Series

4.2.1 Orthonormal Systems
In Chapter 3 we introduced the concept of orthogonal (or perpendicular)
vectors. Namely, two vectors u and v are called orthogonal if their scalar
product (dot product) u · v equals zero. Recall also that u · u = kuk2 > 0 if
u 6= 0. We now extend this concept in two respects. We define orthogonality
Fourier Methods and Integral Transforms 339
for systems of more than two vectors, and we define orthogonality for functions
instead of vectors.
Throughout this chapter, the functions considered will be piecewise con-
tinuous. Such functions are formed by putting together continuous functions
defined on separate intervals, for example the sign function or the greatest
integer function as considered in Calculus Books, or the “sawtooth” function
defined by
1 1
f (x) = x − k , if k − ≤ x < k + . (4.1)
2 2
The formal definition runs as follows. A function f defined on an interval
I = [a, b] is called piecewise continuous, if we can decompose I into finitely
many subintervals Ik = [xk−1 , xk ] such that f is continuous in the interior of
Ik and the one-sided limits limx→xk−1 + f (x) and limx→xk − f (x) exist for all
such subintervals. If the domain of f is unbounded, we require this property
to hold for every bounded interval in the domain. As a consequence, we can
integrate piecewise continuous functions, and
Z b X Z xk
f (x) dx = f (x) dx .
a k xk−1
Definition 76. (Orthogonal functions)

Let f1 , f2 be piecewise continuous real-valued functions defined on an interval
[a, b]. Their scalar product (or inner product) is denoted by hf1 , f2 i and
defined by
Z b
hf1 , f2 i = f1 (x)f2 (x) dx . (4.2)
a
The functions f1 and f2 are said to be orthogonal on [a, b] if
Z b
hf1 , f2 i = f1 (x)f2 (x) dx = 0 . (4.3)
a
Example 221. (a) The functions f1 (x) = ex and f2 (x) = sin x are orthogo-
nal on the interval [π/4, 5π/4].
(b) The functions f1 (x) = x and f2 (x) = cos 2x are orthogonal on the interval
[−π/2, π/2].
In analogy to the case of vectors, for functions f : [a, b] → R we define
s
p Z b
kf k = hf, f i = f (x)2 dx (4.4)
a
and call it the norm of f . (We might also call kf k the length of f , although
its geometric meaning is no longer obvious.) We then have
Z b
kf k2 = hf, f i = f (x)2 dx .
a
As in the case of vectors, we have k0k = 0 for the zero function. On the other
hand, a piecewise continuous function f satisfies kf k = 0 only if f is the zero
function; in other words, if f is nonzero, we must have kf k > 0.
If two nonzero functions f1 and f2 are orthogonal, we say that the set
{f1 , f2 } formed by those two functions is an orthogonal system. If we have
more than two functions, we require that the functions are mutually (or pair-
wise) orthogonal in the sense of the following definition.
Definition 77. (Orthogonal system, orthonormal system)
A system (or set) consisting of finitely or infinitely many nonzero functions
f1 , f2 , . . . is said to be orthogonal on the interval [a, b] if
Z b
hfm , fn i = fm (x)fn (x) = 0 ,
a
whenever m 6= n.
If moreover
hfn , fn i = 1 = kfn k , for all n,
the system is called orthonormal on [a, b].
Remark 47. An orthogonal system {f1 , f2 , . . . } can be made into an or-
thonormal system by replacing each function fn with its scalar multiple
fn
.
kfn k
Example 222. (a) The set {1, cos x, cos 2x, . . . , cos nx} is an orthogonal sys-
tem on [−π, π].
n 1 cos x cos nx o
(b) The set √ , √ , · · · , √ is an orthonormal system on [−π, π].
2π π π
(c) Examine whether the following systems are orthonormal or not on the
intervals indicated.
(i) {sin x, sin 3x, sin 5x, · · · }, I = [0, π/2],
(ii) {sin nx} : n = 1, 2, 3, . . . }, I = [0, π],
cos nπx
(iii) {1, : n = 1, 2, 3, . . . }, I = [0, l],
l
cos nπx sin mπx
(iv) {1, , } : n, m = 1, 2, 3, . . . }, I = [−l, l].
l l
Solution: We discuss here part (c).
(i) We set fn (x) = sin(2n + 1)x. For m 6= n, we obtain
Z π/2
hfn , fm i = sin(2n + 1)x sin(2m + 1)x dx
0
Z π/2
1
= [cos 2(n − m)x − cos 2(n + m + 1)x] dx
2 0
(using the trigonometric identity 2 sin Ax sin Bx = cos(A−B)x−cos(A+B)x)
1 h sin 2(n − m)x iπ/2 1 h sin 2(n + m + 1)x iπ/2

= −
2 2(n − m) 0 2 2(n + m + 1) 0
= 0 − 0 = 0.
For n = m we get
Z π/2 Z π/2 1 1
sin2 (2n + 1)x dx = − cos 2(2n + 1)x dx
0 0 2 2
(using the identity 2 sin2 Ax = 1 − cos 2Ax)
1 π/2 1 π/2
Z Z
= dx − cos 2(2n + 1)x dx
2 0 2 0
h 1 iπ/2 1 h 1 iπ/2
= x − sin 2(2n + 1)x
2 0 2 2(2n + 1) 0
π
= .
4
Hence the given system is orthogonal
√ but not orthonormal. However, if we
multiply each function by 2/ π, the new system obtained in this way is or-
thonormal, that is,

2 sin x 2 sin 3x sin 5x 2 sin(2n + 1)x
√ , √ ,2 √ ,..., √ ,··· ,
π π π π
is an orthonormal system on [0, π/2].

(ii) Set fn (x) = sin nx. For m 6= n we have
Z π
1 πh
Z i
hfn , fm i = sin nx sin mx dx = cos(n − m)x − cos(n + m)x dx
0 2 0
(by the trigonometric identity 2 sin Ax sin Bx = cos(A − B)x − cos(A + B)x)
1 π 1 π
Z Z
= [cos(n − m)x] dx − [cos(n + m)x] dx
2 0 2 0
1 h sin(n − m)x iπ 1 h sin(n + m)x iπ
= − = 0.
2 (n − m) 0 2 (n + m) 0
For m = n we have
Z π Z π 1 1
hfn , fn i = sin2 nx dx = − cos 2nx dx
0 0 2 2
1 1 h iπ π
= π− sin 2nx = ,
2 4n 0 2
where again we used 2 sin2 Ax = 1 − cos 2Ax. Hence {sin nx}

(√is an orthogonal,
)
2 sin nx
but not an orthonormal system on [0, π]. But the system √ is an
π
orthonormal system on [0, π].
(iii) For m 6= n we have
Z l
1 lh (n − m)πx
Z
nπx mπx (n + m)πx i
cos cos dx = cos + cos dx
0 l l 2 0 l l
(by the trigonometric identity 2 cos Ax cos Bx = cos(A − B)x + cos(A + B)x)
1 l (n − m)πx 1 l
Z Z
(n + m)πx
= cos dx + cos dx
2 0 l 2 0 l
l h (n − m)πx il l h (n + m)πx il
= sin + sin
2(n − m)π l 0 2(n + m)π l 0
= 0 + 0 = 0.
For m = n we have
Z l Z l h 1 il
nπx 1 1 2nπx l h 2nπx il
cos2 dx = + cos dx = x + sin
0 l 0 2 2 l 2 0 4nπ l 0
l
= .
2
Moreover,
Z l Z l
nπx h l nπx il
1 · cos dx = sin = 0, 12 dx = l .
0 l nπ l 0 0
The given system is orthogonal but not orthonormal. However, the system
n 1 √2 cos nπx o
√ ,√ is orthonormal.
l l l
(iv) We know already from part (iii) that
Z l Z l
nπx mπx nπx mπx
cos cos dx = 2 cos cos dx = 0 .
−l l l 0 l l
Concerning the sine, for m 6= n we get
Z l Z l
nπx mπx nπx mπx
sin sin dx = 2 sin sin dx
−l l l 0 l l
(by the identity 2 sin Ax sin Bx = cos(A − B)x − cos(A + B)x)
Z lh
(n − m)πx (n + m)πx i
= cos − cos dx
0 l l
h (n − m)πx l il h (n + m)πx l il
= sin · − sin ·
l n−m 0 l n+m 0
= 0 − 0 = 0.
Using the same identities as in the previous computations one derives that
Z l Z l
nπx nπx
sin2 dx = l , cos2 dx = l ,
−l l −l l
Z l
nπx mπx
sin cos dx = 0 , for n 6= m,
−l l l
moreover
Z l Z l Z l
nπx nπx
1 · cos dx = 0 , 1 · sin dx = 0 , 12 dx = 2l .
−l l −l l −l
The given system is orthogonal but not orthonormal. However, the system
n 1 1 cos nπx 1 sin nπx o
√ ,√ ,√ is orthonormal.
l l l l l
Example 223. (a) Haar systems The Haar system {hn } is defined as
follows:
Set h0 = 1. For n, k ∈ N with 0 ≤ k < 2n define hn on [0,1] by
 n/2
 2 , if x ∈ I(2k, n + 1)
h2n +k (x) = −2n/2 , if x ∈ I(2k + 1, n + 1)
0, otherwise

h2n +k (x + 1) = h2n +k (x).

Each Haar function is continuous from the right and the Haar system {hn } is
orthonormal on [0,1).
(b) Walsh systems Let N denote the set of non-negative integers, P denote
the set of positive integers, R denote the set of real numbers, C denote the set
of complex numbers and Q denote the set of rationals in the unit interval [0,1).
Thus each dyadic element of Q has the form 2pn for some p, n ∈ N, 0 ≤ p < 2n .
During the discussion we will use the notation x = (xα , α ∈ A) to represent
a collection x indexed by a set. Let v be a function defined on [0,1] by

1, 0 ≤ x < 1/2
r(x) = , (4.5)
−1, 1/2 ≤ x < 1
r(x + 1) = r(x) (extended to R by periodicity 1).

The Rademacher system r = (rn , n ∈ N ) is defined by
rn (x) = r(2n x), x ∈ R, n ∈ N. (4.6)
Given n ∈ N it is possible to write

∞
X
nk 2k (4.7)
k=0
where nk = 0 or 1 for k ∈ N . The expression will be called the binary expan-

sion of n and the numbers nk will be called the binary coefficients of n.
There are three types of Walsh functions, namely the original Walsh sys-
tem, Walsh-Kaczmarz system and Walsh-Paley system. We discuss here the
Walsh-Paley system invented in 1932 and we will call it the Walsh system.
This contains the Rademacher system as a special case and enjoys many prop-
erties analogous to the classical trigonometric, Sturm-Liouville and Legendre
systems. Let n ∈ N have binary coefficients, nk , k ∈ N , then
∞
Y
wn = (rk )nk . (4.8)
k=0
We observe that the product is always finite because nk = 0 for k sufficiently

large and by definition w0 = 1 and w2n = rn for n ∈ N . It is also clear that
the Walsh system is closed under the finite products and every Walsh function
is piecewise constant with finitely many jump discontinuities on [0,1), and
takes values +1 and −1. It may be noted that the Walsh system is complete
and orthonormal. Every function f (x) which is periodic with period 1 and
Lebesgue integrable on [0,1] can be expanded in a Walsh-Fourier series.
∞
X
f (x) ∼ cn wn (x),
n=0
R1
where cn = 0 f (x)wn (x)dx, n = 0, 1, 2.....
Many properties of Walsh-Fourier series are analogous to trignometric Fourier
series, for example, the Dirichlet kernels for the Walsh system and trigono-
metric system have the same order of growth. The theory of Walsh-Fourier
series is quite rich. It has been shown that Walsh analysis is a special case of
the harmonic analysis on compact Abelian groups.
Remark 48. The systems named after Rademacher, Haar and Walsh are
other well-known orthonormal systems.
A function f is said to be a linear combination of the functions f1 ,
f2 , . . . , fn if
f = α1 f1 + α2 f2 + . . . + αn fn
holds for suitably chosen scalars α1 , α2 , . . . , αn .
Definition 78. (Linear dependence and independence)
A system {f1 , f2 , · · · , fn } of functions is said to be linearly independent if
α1 f1 +α2 f2 +α3 f3 +· · ·+αn fn = 0 implies that α1 = α2 = α3 = · · · = αn = 0.
The system is called linearly dependent if it is not linearly independent. In
other words, at least one element of the system is a linear combination of the
remaining n − 1 elements. An infinite system {f1 , f2 , . . . } of functions is said
to be linearly independent, if every finite subset taken from it forms a linearly
independent system in the sense above, and it is said to be linearly dependent
if this is not the case.
Remark 49. Every orthogonal (and, hence, every orthonormal) system is

linearly independent. However, the converse is not true, so a system may be
linearly independent without being orthogonal.
4.2.2 Fourier Series

A series of the form
1
a0 + (a1 cos x + b1 sin x) + (a2 cos 2x + b2 sin 2x) + · · ·
2
that is,
∞
1 X
a0 + (an cos nx + bn sin nx) (4.9)
2 n=1
is called an (infinite) trigonometric series.

Let f be a piecewise continuous function defined on the interval [−π, π].
The numbers
1 π
Z
a0 = f (x) dx , (4.10)
π −π
1 π 1 π
Z Z
an = f (x) cos nx dx , bn = f (x) sin nx dx , (4.11)
π −π π −π
where n = 1, 2, 3, . . . , are called the Fourier coefficients (more precisely,

the Fourier cosine resp. sine coefficients) of f on the interval [−π, π], and the
series (4.9) with the coefficients from (4.10) and (4.11) is called the Fourier
series of f on this interval. Note that a0 /2 is just the average of the function
f over [−π, π].
The Fourier coefficients arise when we want to represent a function f :
[−π, π] → R as a trigonometric series,
∞
1 X
f (x) = a0 + (an cos nx + bn sin nx) , (4.12)
2 n=1
for the following reason. For an arbitrary trigonometric series, we define the
partial sums
N
1 X
sN (x) = a0 + (an cos nx + bn sin nx) . (4.13)
2 n=1
(We remind the reader that we have studied series and their partial sums in
Chapter 5.) Now let us compute, for N ≥ n ≥ 1, using the orthogonality of
the system {cos nx, sin nx : n = 1, 2, . . . } on [−π, π],

Z π Z π N Z π
1 X
sN (x) cos nx dx = a0 cos nx dx + ak cos nx cos kx dx
−π 2 −π −π
k=1
N
X Z π
+ bk cos nx sin kx dx (4.14)
k=1 −π
= 0 + an π + 0 = πan .
If f can be represented as in (4.12), that is, if
f (x) = lim sN (x) ,

N →∞
and if moreover
Z π Z π
f (x) cos nx dx = lim sN (x) cos nx dx ,
−π N →∞ −π
then we see from (4.14) that

Z π
1
an = f (x) cos nx dx .
π −π
Similarly, we get the corresponding formula for bn by using sin nx instead of

cos nx in (4.14).
Instead of the interval [−π, π] we may consider an interval [−l, l], where
l > 0 is an arbitrary number.
Definition 79. (Fourier series)
Let f be a piecewise continuous function defined on the interval [−l, l], where
l > 0. Then the Fourier series of f on [−l, l] is given by
∞
a0 X nπx nπx
+ an cos + bn sin , x ∈ [−l, l] ,
2 n=1
l l
where
Z l
1
a0 = f (x) dx , (4.15)
l −l
Z l Z l
1 nπx 1 nπx
an = f (x) cos dx , bn = f (x) sin dx . (4.16)
l −l l l −l l
If the Fourier series at x converges to f (x), we write as usual

∞
a0 X nπx nπx
f (x) = + an cos + bn sin . (4.17)
2 n=1
l l
The following theorem gives conditions under which (4.17) holds, that is,
when the Fourier series of f on [−l, l] converges to f .
Theorem 45. (Dirichlet convergence theorem)

Let f and f 0 be piecewise continuous on the interval [−l, l], let x ∈ (−l, l).
(i) If f is continuous at x, the Fourier series of f at x converges to f (x).
(ii) If f is discontinuous at x, the Fourier series of f at x converges to the
mean value
f (x+) + f (x−)
,
2
where f (x+) and f (x−) denote the right- resp. left-hand limit of f at x.
Note that in fact (i) can be viewed as a special case of (ii), since f (x) =
f (x+) = f (x−) whenever f is continuous at x.
Example 224. Expand the following functions in Fourier series.
(
0, −π < x < 0 ,
(a) f (x) = on [−π, π],
π − x, 0 ≤ x < π,
(
0 , −π < x < 0 ,
(b) f (x) = on [−π, π],
1, 0 ≤ x < π,
(
0 , −1 < x < 0 ,
(c) f (x) = on [−1, 1],
x, 0 ≤ x < 1,
(
0, −π < x < 0 ,
(d) f (x) = on [−π, π],
sin x , 0 ≤ x < π ,

π
− 4 , −π < x < 0 ,

(e) f (x) = 0 , x = 0, on [−π, π],
π

4 , 0 ≤ x < π,
(
−1 , −4 < x < 0 ,
(f) f (x) = on [−4, 4],
1, 0 ≤ x < 4,
(g) f (x) = ex , on [−π, π].

Solution: (a) Here we have l = π. We get

Z π Z 0 Z π
1 1h i
a0 = f (x) dx = f (x) dx + f (x) dx
π −π π −π 0
0 π
x2 iπ
Z Z
1h i 1h π
= 0 dx + (π − x) dx = πx − = .
π −π 0 π 2 0 2
Z π Z π Z 0
1 1h i
an = f (x) cos nx dx = 0 dx + (π − x) cos nx dx
π −π π −π 0
1 π
π
Z
1 h sin nx i i
= (π − x) + sin nxdx
π n 0 n 0
h 1 cos nx iπ − cos nπ + 1 1 − (−1)n
= − = 2
= .
nπ n 0 n π n2 π
Similarly we can calculate that
1 π 1h 0
Z Z Z π i
bn = f (x) sin nx dx = f (x) sin nx dx + f (x) sin nx dx
π −π π −π 0
1 π
Z
1
= (π − x) sin nx dx = .
π 0 n
Since the given function is continuous except at x = 0, the Fourier series

converges. See the graphs of the function and of a partial sum of its Fourier
series in Figures 4.1a and 4.1b respectively.
f(x)
7t
-+-----:_~7t~~o+-~7tr------+-~- x
FIGURE 4.1a: Function

3_5
- - - Fourier Approx imation
3 ---Actual Function
2_5 ------ ---:---------- r - ---------:- --------- - -:--------------- -----.------ ---

0 0 0
0 0 0
0 0 0
0 0 0
2
0 0 0
•••••• ••-'••••••••• • L • • • • • • • • • -'• • • • • • • • • •
=
----- --- .... --------- . .. ---------
~
1_5 0 0
~ - --------
0
~
<{
------ ---.----------
o
0
r - -------- -.-- -------
0
0
0
0
--------- -·---------- ..
0
0
0
0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
______ _ _ _•0 _ _ _ _ _ _ _ ___ 0
!_ _ _ _ _ _ _ _ _ _ _•0 __ _____ _ 0 0
0 _5 0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0
-0-54L_--~-3
~---~ L__ _~_ _ _L __ _~
~----1 2---~--~4-
Time
FIGURE 4.1b: Fourier Approximation Plot for 5 Harmonics
(b) We have
π 0 π
1 π
Z Z Z Z
1 1h i
a0 = f (x) dx = f (x) dx + f (x) dx = 0 + 1 dx
π −π π −π 0 π 0
= 1,
π
1h 0
Z Z Z π
1 i
an = f (x) cos nx dx = f (x) cos nx dx + f (x) cos nx dx
π −π π −π 0
Z π h sin nx iπ
=0+ 1 cos nx dx = = 0,
0 n 0
1 π 1h 0
Z Z Z π i
π −π π −π 0
1 π
Z
1 h cos nx iπ
=0+ sin nx dx = −
π 0 π n 0
1 h cos nπ 1i
= − +
π n n
(−1)n 1
=− + , as cos nπ = (−1)n ,
nπ nπ
1
= (1 − (−1)n ) .
nπ
Since the given function is continuous except at x = 0, the Fourier series
converges to the given function for x 6= 0, and we have
∞
1 1 X 1 − (−1)n
f (x) = + sin nx .
2 π n=1 n
For the graphs of f and of a partial sum of its Fourier series see Figures 4.2a
and 4.2b respectively.
f(x)
X
- lr 0
1.2 ~~~~===c====~~--r----:-----:-----:--~
- - - F ourier Approximation
- - - A c t ual F unct ion
.. ..
0 _8 -------- -·--------- .. --- --- --- ~---------
..
.. ..
---------·--------- .. --- --- ---·-- --- ---- ..
... ...
-------- ·--------- -. --- --- ---·-------- ...
. ... ...
0.2 ---------
..... .. .....
_ _ _ _ _ _ __ .,! _ _ _ _ _ _ _ _ _ ,!. _ _ _ _ _ _ _
-o_24L__ _ __~3---~-2~----~
1----o
~--~----2
~--~3~--_j4.
Time
(c) Here l = 1. The given function is continuous also at x = 0, hence its

Fourier series converges everywhere. We get

Z 1
1h 0
Z Z 1 i hZ 0 Z 1 i
a0 = f (x) dx = f (x) dx + f (x) dx = 0 dx + x dx
−1 π −1 0 −1 0
Z 1 h 1 i1 h 1 i 1
=0+ x dx = x2 = · 12 − 0 = ,
0 2 0 2 2
Z 1 Z 0 Z 1
1
an = f (x) cos nx dx = cos nπx dx + f (x) cos nπx dx
1 −1 −1 0
Z 0 Z 1
= 0 cos nπx dx + x cos nπx dx
−1 0
h x sin nπx i1 Z 1 sin nπx
=0+ π − dx
n 0 0 nπ
h sin nπ i h cos nπx i1 h cos nπ cos nπ0 i
= −0 + = 0 + −
nπ π 2 n2 0 n2 π 2 π 2 n2
n
(−1) 1 1
= 2 2 − 2 2 = 2 2 [(−1)n − 1] .
π n n π n π
Similarly
Z 1 Z 0 Z 1
bn = f (x) sin nπx dx = f (x) sin nπx dx + f (x) sin nπx dx
−1 −1 0
1
(−1)n+1
Z
=0+ x sin nπx dx = ,
0 nπ
through integration by parts. See Figures 4.3a and 4.3b for the graphs of the
function and of a partial sum of its Fourier series.
f(x)
1 ----------
~~---+----~----+-----~- X
- 7£ 0

- - - Fourier Approximation
- - - Actual Function
- --- -- ,-- -- -- --.- -- --- -- ,. -- -- -- -
' ' '
'' ''
.
'
0.8 -- - ---,----
'
'
'
. ...
'
''
-- --.- -- --- -- ,.- -- -- ---
. ' '
. -- --- -- ......--- -- ---

' '
' '
'' '
-- - ----,.- --- -- --·-
''
0.6
=
~
.. '
'
'
.. ''
- --- -- -,.--------·- --- ----..--------

.
~ 0 .4
' ' '
<!
--- --
'
'
~ ---- ---
..
-·- -- ----- ... -------
~-2 L--~--~--_L __ ~ _ _L __ _ L__~ _ __ L_ __ L_ _~
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

Time
(d) We have
Z π Z 0 Z π
1 1 1
a0 = f (x) dx = f (x) dx + f (x) dx
π −π π −π π 0
Z π
1 1 2
=0+ sin x dx = [− cos x]π0 = .
π 0 π π
Z π hZ 0 Z π
1 i
π −π −π 0
Z π
1
= sin x cos nx dx
π 0
Z π
1
= [sin(n + 1)x + sin(1 − n)x]dx
2π 0
(by the identity 2 sin Ax cos Bx = sin(A + B)x + sin(A − B)x)
1 h − cos(n + 1)x iπ 1 h − cos(n − 1)x iπ

= +
2π n+1 0 2π 1−n 0
1 h − cos(n + 1)π 1 i 1 h − cos(1 − n)π 1 i

= + + +
2π n+1 n+1 2π 1−n 1−n
1 + (−1)n
= , for n = 2, 3, 4, . . .
π(1 − n2 )
Z π
1
a1 = sin 2x dx = 0 .
2π 0
1 π 1 π
Z Z
bn = f (x) sin nx dx = sin x sin nx dx
π −π π 0
Z π
1
= cos(1 − n)x − cos(1 + n)x] dx
2π 0
= 0, for n = 2, 3, 4, . . .
Z π
1 1
b1 = (1 − cos 2x) dx = .
2π 0 2
Thus,
∞
1 1 X 1 + (−1)n
f (x) = + sin x + cos nx
π 2 n=2
π(1 − n2 )
is the Fourier series for f . See the graph of the function in Figure 4.4a and
the graph of a partial sum of its Fourier series in Figure 4.4b.
f(x)
(e) We have
π
1 0 1 π
Z Z Z
1
a0 = f (x) dx =
f (x) dx + f (x) dx
π
−π π −π π −π
1 0 π 1 ππ
Z Z
π π
= − dx + dx − + = 0 .
π −π 4 π 0 4 4 4
1.2 r-;=========r;---r-- -: - - : - -: - l
- - - Actual F unctio n
____ ., _______ ___

0.8
..... .. .
-------- ---------- .. ----------.----------
.
__ _, _________ _
...
0.6
.. ....
~
g- ---- ---- ...

. . . ..
<0: 0.4 .... ----------
...... ----
...
------.---------- ------. ----------
.. ..
0.2 --:-. ------:----------:----------
... ..
--------·----------
.... ..
0
-0.2-4 -3 -2 -1 2 3 4
Tim e
Z π
1
an = f (x) cos nx dx
π −π
Z 0 Z π
1 1
= f (x) cos nx dx + f (x) cos nx dx
π −π π 0
1π 0 1π π
Z Z
=− cos nx dx + cos nx dx
π 4 −π π4 0
1 h sin nx i0 1 h sin nx iπ
=− + = 0.
4 n −π 4 n 0
Z π
1
bn = f (x) sin nx dx
π −π
Z 0
1 π
Z
1
= f (x) sin nx dx +
f (x) sin nx dx
π−π π 0
1 0 π 1 ππ
Z Z
= − sin nx dx + sin nx dx
π −π 4 π 0 4
1 h cos nx i0 1 h cos nx iπ
= −
4 n −π 4 n 0
1 cos nπ 1 1 2 2(−1)n
= − − cos nπ + = − .
4n 4n 4n 4n 4n 4n
Thus, the Fourier series for f is
∞
X 1
f (x) = sin(2n − 1)x .
n=1
2n −1
The graph of this function and of a partial sum of its Fourier series are given
in Figures 4.5a and 4.5b respectively.
f(x)
7ll4
X
- 7r 0 7r
- - 7ll4
.. .. ..
- --Actual Function
____ ____ .. ______ ___ __ ____ ___ .. __ _____ __
0.6
.... ... ... - -- - -- - - -J-- -- --- - -
------ --- .·------ ---..L ------- --

-~- - - ---
___ ___ __, ______ ____,__ _____ ______ ____ __

0.4 ---- ----L...
0
...
•
...
•
-g 0.2
.. .. ..
---- ---- , - ----- --- . - ------ -- . -- ----- --
]f 0~--,r.---~---~----t---~.---~.~--~.~---1
-0.2
... ...
- -- - - -----:-- -- - --- --:----- - - -- -:--- --- - - -
... ..
. ---- - -
- ---- ---- ______ ___ __ ____ ___ .. ___ ____ _ - -- ----- _ _, __ ___- -- __,__ ____ - -- -·---
. ... ...
... - - ----- -- ,....-- - ----
.
...
-0.4
- ----- -- .
.. ------ --.--.. ------ --,--- . ------
-0.6 . .
- - ------ - - -.--
... ... ...
-0.8 .~,..........,,h-.-;~-.-,h~,.......-.H-- - - --- -- ~----.. - ·- --~--- .. - -- · - --:----.. - - ---
Time

(f) We have
Z 4 Z 0 Z 4
1 1 1 1 1
a0 = f (x) dx = f (x) dx + f (x) dx = [−x]0−4 + [x]40
4 −4 4 −4 4 0 4 4
= −1 + 1 = 0 .
1 0 1 4
Z Z
nπx nπx
an = − cos dx + cos dx
4 −4 4 4 0 4
1h 4 nπx i0 1h 4 nπx i4
=− sin + sin = 0.
4 nπ 4 −4 4 nπ 4 0
0
1 4
Z Z
1 nπx nπx
bn = f (x) sin
dx + f (x) sin dx
−44 4 4 0 4
1 0 1 4
Z Z
nπx nπx 1h 4 nπx i0
=− sin dx + sin dx = cos
4 −4 4 4 0 4 4 nπ 4 −4
1 4
h nπx i 4 1 1
− cos = [1 − (−1)n ] − [(−1)n − 1]
4 nπ 4 0 nπ nπ
2 2(−1)n 4 1
= − = .
nπ nπ π 2n − 1
Therefore, the Fourier series of f is
∞ h (2n − 1)πx i
4X 1
f (x) = sin .
π n=1 2n − 1 4
The graph of this function and of a partial sum of its Fourier series are given
in Figures 4.6a and 4.6b.
f(x)
X
-4 0 4
- -1

---Actual Function
... ... ...
-- --- --·- - - -- -- -·- ---- -- -'--- --- -
... .. .... ....

0.5 ... - - --- - -·-.... ---- -- ....
-- --- - -·- ~ --- - -- -
~
~
.. .. ..
~
0
~
<{
' . . .
-0.5
... ... ... ...
--- --- - , -- --- -- y -- --- -- y - -- ---- r -- ----
0 • • •
... ... ... ...

-1 t-----''r----t'------'-!----7''------"t-----;'--l--- --- -
..''
~ -- --- -
'.
..'
. ..' .
-: -- --- --: -------: -- ----
....
. ' ' ' . 0
-1 _5
-5 -4 -3 -2 -1 2 3 4 5
Time
Z π Z π
1 1 1 π
(g) a0 = f (x) dx = ex dx = (e − e−π ) ,
π −π π −π π
π π
(−1)n (eπ − e−π )
Z Z
1 1
an = f (x) cos nx dx = ex cos nx dx = ,
π −π π −π π(1 + n2 )
π π
(−1)n (e−π − eπ )
Z Z
1 1
bn = f (x) sin nx dx = ex sin nx dx = .
π −π π −π π(1 + n2 )
The Fourier series of f is
∞
eπ − e−π X h (−1)n (eπ − e−π ) (−1)n (e−π − eπ ) i
f (x) = + cos nx + sin nx .
2π n=1
π(1 + n2 ) π(1 + n2 )
See Figures 4.7a and 4.7b for the graph of the function and of a partial sum
of its Fourier series.
4.2.3 Further Properties of Fourier Series

Periodic Extension
Let us recall the notion of a periodic function given in Chapter 1. A real
function of a single variable is called periodic with period p if f (x + p) = f (x)
for all x. For example, 8π is a period of the sine function as sin(x + 8π) = sin x
for all x. The smallest value of p for which f (x + p) = f (x) holds for all x
is called the fundamental period. For example, p = 2π is the fundamental
period of the sine function as 2π is the smallest value of p which satisfies
f(x)
- Jr 0
f (x + p) = f (x) for all x. Let us point out that often “period” is defined to
be the fundamental period.
Let f be an arbitrary function defined on (−l, l). Its Fourier series (if
convergent)
∞
1 X nπx nπx
a0 + an cos + bn sin
2 n=1
l l
is a periodic function of x of period p = 2l and thus (if convergent to f ) not

only represents f on (−l, l), but also gives the periodic extension of f on the
real line.
Approximation by Partial Sums
Let
N
1 X
sN (x) = a0 + an cos nx + bn sin nx
2 n=1
denote the N th partial sum of the Fourier series of a function f defined on

[−π, π]. One may ask how well sN approximates f . We present without proof
the following two results. If f is continuous on [−π, π] and f (−π) = f (π),
then sZ
π
1 1
|f (x) − sN (x)| ≤ √ √ |f 0 (t)|2 dt , for all x.
N π −π
If f is continuous, piecewise differentiable and satisfies f (−π) = f (π), then

∞
X
|f (x) − sN (x)| ≤ (|an | + |bn |) , for all x.
n=N
Gibbs Phenomenon
25 rr===================,-----,-----,------,-----,-----,
- - - Fourier Approximat ion
- - - Actual Function
20 --------- ~- -------- ~ --------- :- ---------
'
..
15 --------- '~- -------- c--
...
'
------- , ---------
'' ''
.
.... ---------
'' ''
=
v
"" 10 ------- - ..' --------- '

~
l .
~---------
' '
'' ''
5 --------- --------- ~-- ------ - ~ --- ------
3 ----~2----~1---~o~_ __ L_ _ _~2---~3---~4
- 54L_____~
Time
If we examine the graphs of the partial sums of the Fourier series in Ex-
ample 222, we observe that all of them are overshooting the true values of
the function f near its point of discontinuity. In fact, this phenomenon always
occurs when we approximate a discontinuous function with Fourier series. It
is known as the Gibbs phenomenon, in honor of Josiah William Gibbs, a
mathematical physicist working at Yale, who analyzed it prior to 1900, after
it had been discovered by Henry Wilbraham in 1848. One can show that the
overshooting amounts to approximately 9% of the size of the jump, that is,
of the difference |f (x+) − f (x−)| of the one-sided limits at the discontinuity
point x. The main point is that the amount of overshooting of the partial
sums sN does not decrease when N tends to infinity.
As an example, let us consider the Fourier series of the step function

−1 , if x ∈ [−π, 0),

f (x) = 0 , if x = 0,

1, if x ∈ (0, π],

extended to a 2π-periodic function on R. Proceeding along the lines of the

solution of Example 224(e) we obtain
∞
4X 1
f (x) = sin(2n − 1)x .
π x=1 2n − 1
The graph of s15 is given in Figure 4.8, and the graph of s100 is given in Figure
4.9. We observe the overshooting of the partial sum around the points where
f is discontinuous.
Complex Form of Fourier Series

~ s(x)
.. .. ..
1.5 . ------ .-----,, -----.------,------.------ .------.------.
.... -... _:l_---.---- j_-------. .!:.. -.... --- r+{\++"'"<::P.'=""-=""'~v'

"'T+Ii--h
:
. .
: . v v :
..
---- ------,------- -- -,-.
... ... ...
0.5 --------- i -- --- -- ---:------- ---:----------
0 • •
... ....
... ... ... ... ...
....----- -----.-----
.. -- ---·---------
.. .
. .
...
.. .. ..
-0.5 -------- - --- - ------ -- ------.----------.----- ----
... ..
.
:A ____ _____ ...._______ ___ .
. .,_ _______ _
-1 ... ....
..
-1.5 L __ _L __ __t__ __ l __ __ l __ __ l __ __ l __ __ l_ __ _ J . ..
-4 -3 -2 -1 0 2 4
Time
FIGURE 4.8: Fourier Approximation Plot for 15 Harmonics for Square Func-
tion
For this paragraph, we assume the reader to have some basic familiar-
ity with complex numbers. The sine and cosine are related to the complex
exponential function by
eiθ = cos θ + i sin θ , e−iθ = cos θ − i sin θ ,

eiθ + e−iθ eiθ − e−iθ
cos θ = , sin θ = ,
2 2i
where θ is any real number. For a real-valued function f with domain (−π, π),
we define the complex Fourier coefficients by
Z π
1
cn = f (x)e−inx dx , (4.18)
2π −π
where n is any integer 0, ±1, ±2, . . . . Since

Z π Z π
f (x)e−inx dx = f (x)(cos nx − i sin nx) dx
−π −π
Z π Z π
= f (x) cos nx dx − i f (x) sin nx dx
−π −π
= πan − iπbn , n > 0,
and, analogously,
Z π
f (x)einx dx = πan + iπbn , n > 0,
−π
,Suxfx)
1.5 , - - - - -, -- - - -, -- - - -, ------,-----,----- ,- - - -- ,- - - - - ,
----------:----------
0.5 ········· ---- ----····----···-- ----- -----·-----

' ---- _..' --------- ' ~
' '
-0.5 ,--
' --- --- - -.---
' -- ---- - ',.' -- --- -- -- -- --- --- --.-----
'
' -- --- ,.'' -- ----- -- ,'' --- --- ---
'' '' '' '' '' ''
' ' ' ' '
' ' ' ' '
-1 ·· ··· ···'1\\'1---~,_.--~- ......ll'il · ·-·······'·····-···· · ······· -· · ·········

- - - Fourier Approxim ation
-1.5 L_____jL__ ___jL__ ___j_ _ _ __ j____ ___.,= =======:::::'J

1 ---Actual Functio n
4 ~ ~ ~ 0 2 4
Time
FIGURE 4.9: Fourier Approximation Plot for 100 Harmonics for Square Func-
tion
we see that the complex Fourier coefficients are related to the Fourier sine and
cosine coefficients by
1 1
cn = (an − ibn ) , c−n = (an + ibn ) ,
2 2 (4.19)
an = cn + c−n , bn = i(cn − c−n ) ,
for any n > 0. Moreover,

Z π
1 a0
c0 = f (x) dx = . (4.20)
2π −π 2
From (4.19) we obtain
cn einx + c−n e−inx = cn (cos nx + i sin nx) + c−n (cos nx − i sin nx)
= an cos nx + bn sin nx .
Therefore, the partial sums sN of the Fourier series can also be represented
as
N N
1 X X
sN (x) = a0 + (an cos nx + bn sin nx) = cn einx ,
2 n=1 n=−N
and the Fourier series of f can be written in complex form as

∞
X
cn einx .
n=−∞
Over an interval (−l, l), l > 0, the complex form of the Fourier series is defined
as
∞
1 l
Z
X nπx inπx
cn e l , where cn = f (x)e− l dx , n = 0, ±1, ±2, ±3, . . .
n=−∞
2l −l
It can be verified that the set of functions { √12π einx }n∈Z is an orthonormal
set.
Sine and Cosine Fourier Series
We consider Fourier series of even and odd functions. Recall that if g is an
even function on (−π, π), then
Z 0 Z π
g(−x) = g(x) for x ∈ [0, π), g(x) dx = g(x) dx ,
−π 0
and if h is an odd function on (−π, π), then

Z 0 Z π
h(−x) = h(x) for x ∈ [0, π), h(x) dx = − h(x) dx .
−π 0
Thus, if f is even on (−π, π), then its Fourier coefficients satisfy
1 π 1 0 1 π
Z Z Z
a0 = f (x) dx = f (x) dx + f (x) dx
π −π π −π π 0
Z π
2
= f (x) dx ,
π 0
Z π
1 0 1 π
Z Z
1
π −π π −π π 0
Z π
2
= f (x) cos nx dx ,
π 0
since the function g(x) = f (x) cos nx is even, and

Z π Z 0 Z π
1 1 1
π −π π −π π 0
= 0,
since the function h(x) = f (x) sin nx is odd. Consequently, the Fourier series
of an even function f is a cosine series,
∞
a0 X
+ an cos nx ,
2 n=1
where the coefficients an are given above.

Similarly we find that the Fourier series of an odd function f contains only
sine terms, hence it is a sine series
∞ Z π
X 2
bn sin nx , where bn = f (x) sin nx dx .
n=1
π 0
Indeed, in this case g(x) = f (x) cos nx is an odd function, while h(x) =
f (x) sin nx is an even function, so in particular
1 π 1 π 1 π
Z Z Z
a0 = f (x) dx = − f (x) dx + f (x) dx = 0 ,
π −π π 0 π 0
1 π 1 0 1 π
Z Z Z
π −π π −π π 0
1 π 1 π
Z Z
=− f (x) cos nx dx + f (x) cos nx dx = 0 .
π 0 π 0
As a consequence, if we know that f is odd or even on (−π, π), we need to

compute only the bk ’s if f is an odd function, the ak ’s if f is an even function.
Example 225. Find the Fourier series of the function f (x) = x2 , x ∈ (−π, π).
Solution: Since f (−x) = (−x)2 = x2 = f (x), f is even and so bn = 0 for
n = 1, 2, 3, . . . We compute
2 π 2 π 2
Z Z
2 1 π 2
a0 = f (x) dx = x dx = · x3 0 = π 2 ,
π 0 π 0 π 3 3
and
2 π 2 π 2
Z Z
an = f (x) cos nx dx = x cos nx dx
π 0 π 0
π
2 h x2 iπ Z
4
= sin nx − x sin nx dx
π n 0 nπ 0
Z π
4 π 4
= 0 + 2 x cos nx 0 − 2 cos nx dx
n π n π 0
4 4 π 4
= 2 (−1)n − 3 sin nx 0 = 2 (−1)n .
n n π n
Thus the Fourier series of f is given as the cosine series
∞
1 2 X (−1)n
π +4 cos nx .
3 n=1
n2
Phase Angle Form and Frequency Spectrum

Let f be a periodic function defined on the real line which has the fundamental
period l, that is, f (x+l) = f (x) for all x, and l is the smallest number satisfying
this condition. We define ω = 2π/l as the frequency corresponding to l. Let
∞
1 X
a0 + an cos(nωx) + bn sin(nωx) (4.21)
2 n=1
be the Fourier series of f on [−l/2, l/2]. The series

∞
1 X
a0 + dn cos(nωx + δn ) (4.22)
2 n=1
is called the phase angle form of the Fourier series. Indeed, if two pairs
(a, b) and (d, δ) of numbers are related by
a = d cos δ , b = −d sin δ ,
p b (4.23)
d = a2 + b2 , δ = arctan − ,
a
then
a cos(nωx) + b sin(nωx) = d cos(nωx + δ)
holds for all x, as a consequence of the trigonometric identities, so the series
(4.21) and (4.22) correspond term by term. The phase angle form of the Fourier
series is also called its harmonic form. It represents a periodic function as a
superposition of cosine waves. The term cos(nωx + δn ) is the nth harmonic,
dn is the nth harmonic amplitude and δn is the nth phase angle of f .
Note that the harmonic amplitude satisfies
dn = 2|cn | , (4.24)
where cn = (an − ibn )/2 is the nth complex Fourier coefficient of f as intro-
duced earlier.
The amplitude spectrum or frequency spectrum(Figure 4.10) of the
periodic function f is a plot of |cn | = dn /2 on the vertical versus n along the
horizontal axis. The phase spectrum of f is a plot of the points (n, δn ) for
n = 0, 1, 2, . . . , where δn = arctan(−bn /an ) is the nth phase angle of f .
Example 226. Find the complex Fourier series of f on the given intervals.
Furthermore find the frequency spectrum of the function.
(a) (
0 , −π < x < 0 ,
f (x) = on [−π, π].
x, 0 ≤ x < π,
(b) (
−1 , −2 < x < 0 ,
f (x) = on [−2, 2].
1, 0 ≤ x < 2,
(c) (
cos x , 0 < x < π/2 ,
f (x) = on [0, π].
0, π/2 ≤ x < π ,
Solution: (a) We have

Z π
1 h 0
Z Z π
1 i
cn = f (x)e−inx dx = f (x)e−inx dx + f (x)e−inx dx
2π −π 2π −π 0
Z π iπ Z π
1 h i 1 h 1 1 1
= 0+ xe−inx dx = − xe−inx · − −e−inx dx
2π 0 2π in 0 2π 0 in
1 i −inπ 1 h −inx 1 1 iπ
= e − e
2n 2π in in 0
1 i −inπ 1 −inπ 1
= e + 2
e −
2n 2πn 2πn2
1 + inπ −inπ 1
= 2
e − 2 , for n 6= 0,
2πn
Z π 2n π
1 π
c0 = x dx = .
2π 0 4
The complex Fourier representation of f becomes
∞
π 1 X 1
f (x) = + [(1 + inπ)e−inπ − 1]einx .
4 2π n2
n=−∞,n6=0
Here we have used the formulas
einπ = (−1)n = e−inπ , e−2πin = 1 , e−inπ/2 = (−i)n ) .
(b) We have
Z 2 Z 0 Z 2
1 −inπx/2 1h −inπx/2
i
cn = f (x)e dx = (−1)e dx + e−inπx/2 dx
4 −2 4 −2 0
i 1
= [−1 + einπ + e−inπ − 1] = [−2 + (−1)n + (−1)n ]
2nπ 2nπ
1 − (−1)n
= , for n 6= 0,
nπi
Z 2
1
c0 = f (x) dx = 0 .
4 −2
Thus
∞
X 1 − (−1)n inπx/2
f (x) = e .
n=−∞
nπi
The fundamental period is equal to 4 so ω = 2π/4 = π/2, c0 = 0 and |cn | =

0.7
0.6
0.5
0.4 n
1
Frequency Spectrum Co = 0 , ICnl = - ( -:)
0.3 n*pt
0..2
FIGURE 4.10: Frequency Spectrum
(1 − (−1)n )/nπ.
(c) Using cos x = (eix − e−ix )/2 we get
1 π 1 π/2
Z Z
−2inx
cn = f (x)e dx = cos xe−2inx dx
π 0 π 0
1 π/2 1 ix
Z Z π/2
−ix −2inx 1
= (e − e )e dx = (e(1−2n)ix − e−(1+2n)ix ) dx
π 0 2 2π 0
1 h 1 1 iπ/2 2ne−inπ + i
= e(1−2n)ix + e(1+2n)ix = .
2π i(1 − 2n) i(1 + 2n) 0 π(1 − 4n2 )
4.3 Fourier Transform

The Fourier transform is a mathematical procedure that breaks a function
into the frequencies that compose it, as a prism breaks light into colors. It
transforms a function f into a new function, fˆ or F[f ] (read as “f hat” or
“script f”) which is called the Fourier transform of f . Depending on the con-
text, the argument of f is a time variable or a spatial variable. The argument
of fˆ usually has the meaning of a frequency.
A function and its Fourier transform are two faces of the same informa-
tion. The function exhibits the time (or spatial) information and hides the
information about frequencies. The Fourier transform displays information
about frequencies and hides the time (or spatial) information. Nevertheless
the function and its Fourier transform both contain all the information about
the function. One can compute the transform from the original function as
well as reconstruct the function from its transform, that is, one can invert the
transform.
In the previous section, we have studied the decomposition of a func-
tion into its Fourier series, which is a periodic function. This works well for
functions defined on a bounded interval, as we can always think of them as
periodically extended to the whole real line. In contrast to that, the Fourier
transform acts on arbitrary (non-periodic) functions.
4.3.1 Basic Properties of Fourier Transform

When dealing with the Fourier transform, one constantly encounters inte-
grals over the whole real line (−∞, ∞), that is, improper integrals. In order
to simplify the exposition in this section, we call a function f defined on
R = (−∞, ∞) integrable on R resp. square integrable on R, if the im-
proper integral
Z ∞ Z ∞
|f (x)| dx , resp. |f (x)|2 dx
−∞ −∞
converges.
For the definition of the Fourier transform, several variants are in common
use, see Remark 57 below. We choose the following one.
Definition 80. (Fourier transform)
Let the function f be defined on R and assume that it is integrable on R. The
function fˆ, defined on R by
Z ∞
ˆ
f (ξ) = f (t)e−iξt dt , (4.25)
−∞
is called the Fourier transform of f .

In signal analysis, t is understood as the time variable and ξ is understood
as the frequency variable, see Section 4.6 below.
Remark 50. (i) According to (4.25), integrals of complex valued functions
are involved in the definition of the Fourier transform. They are defined as
Z ∞ Z ∞
−iξt
f (t)e dt = f (t)(cos(ξt) − i sin(ξt)) dt
−∞ −∞
Z ∞ Z ∞
= f (t) cos(ξt) dt − i f (t) sin(ξt) dt ,
−∞ −∞
that is, the real and imaginary parts of the integrand are evaluated separately
and yield the real and imaginary part of the integral, which is a complex
number (in this case, the number fˆ(ξ)).
(ii) The integrand in (4.25) satisfies
|f (t)e−iξt | = |f (t)| · |e−iξt | = |f (t)| ,
since |eix | = 1 for all real numbers x. Therefore and since f is assumed
to be integrable on R, the improper integral in (4.25) is defined. One then
infers from the properties of parameter-dependent integrals that the Fourier
transform fˆ is a continuous function. In addition, one can prove that fˆ(ξ)
tends to zero as ξ tends to ±∞. (The latter result is called the Riemann-
Lebesgue lemma.)
(iii) As it stands, the requirement that f has to be integrable on the whole
line is rather restrictive. For example, Definition 80 does not cover the case
when f is a constant function. Indeed, the Fourier transform of the constant
1 is defined, but it is no longer a function defined on R, but a more general
mathematical object (although it is called the Dirac function). This, however,
is outside the scope of this book.
Example 227. Find the Fourier transforms of the following functions.
(a) f (t) = e−|t| .
(
0, t<0
(b) f (t) = .
e−t , t≥0
(c) Let a and k be positive numbers, let

(
k, −a ≤ t < a
f (t) = .
0, otherwise
Solution: (a) For f (t) = e−|t| we get

Z ∞ Z 0 Z ∞
fˆ(ξ) = e−|t|−itξ dt = e−|t|−itξ dt + e−|t|−itξ dt
−∞ −∞ 0
Z 0 Z ∞
= et e−iξt dt + −t −iξt
e e dt
−∞ 0
1
h h −1
it=0 it=∞
= e(1−iξ)t + e−(1+iξ)t
1 − iξ t=−∞ 1 + iξ t=0
1 1 2 2
= + = 2 = .
1 − iξ 1 + iξ 1 + ξ2 1 + ξ2
(b) Let (
1, t ≥ 0
H(t) =
0, t < 0
be the Heaviside function (see Chapter 2). Then f (t) = H(t)e−t and
Z ∞ Z ∞ Z ∞
ˆ
f (ξ) = f (t)e −iξt
dt = −t −iξt
H(t)e e dt = e−t e−iξt dt
−∞ −∞ 0
Z ∞ it=∞
h 1 1
= e−(1+iξ)t dt = − e−(1+iξ)t = .
0 1 + iξ t=0 1 + iξ
(c) We obtain
Z ∞ Z a h −k it=a
fˆ(ξ) = f (t)e−iξt dt = ke−iξt dt = e−iξt
−∞ −a iξ t=−a
k h −iξa i 2k
=− e − eiξa = sin(aξ) ,
iξ ξ
since
eiξa − e−iξa
sin(aξ) = .
2i
Remark 51. (i) For a given function f , the Fourier transform (if defined)
yields a new function fˆ. We may thus view the Fourier transform as a mapping
whose domain and range are certain sets of functions. Such a mapping is
commonly called an operator. Let us denote it by F, so
F(f ) = fˆ . (4.26)
From Definition 80 we see that F is linear, that is,
F(αf + βg) = αF(f ) + βF(g)
holds for functions f , g and scalars α, β.

(ii) Instead of F(f ) = fˆ one often writes
F[f (t)] = fˆ(ξ) .
Although this is, in a strict sense, mathematically not correct (it confuses
the functions f and fˆ with their function values f (t) and fˆ(ξ)), it leads to a
concise way of writing formulas. In this notation, the result of Example 227
(a) becomes
2
F[e−|t| ] = .
1 + ξ2
In the next two theorems, we state some important properties of the
Fourier transform.
Theorem 46. Let f be a function which is integrable on R.
(a) Time shift. Let t0 be a real number, let g(t) = f (t − t0 ). Then
ĝ(ξ) = e−iξt0 fˆ(ξ) , ξ ∈ R. (4.27)

This means that the Fourier transform of the translated function f equals the
Fourier transform of f multiplied by a factor. In shorter notation, without
introducing the function g explicitly, (4.27) becomes
F[f (t − t0 )] = e−iξt0 fˆ(ξ) .
(b) Frequency shift. Let ξ0 be a real number, let g(t) = eiξ0 t f (t). Then
ĝ(ξ) = fˆ(ξ − ξ0 ) , ξ ∈ R, (4.28)
or, in shorter notation,
F[eiξ0 t f (t)] = fˆ(ξ − ξ0 ) .
(c) Scaling or dilation. Let a be a real number with a 6= 0, let g(t) = f (at).
Then
1 ˆ ξ
ĝ(ξ) = f , ξ ∈ R. (4.29)
|a| a
This states that the Fourier transform of the scaled function is obtained by re-
placing ξ by ξ/a in the Fourier transform of the original function and dividing
by the magnitude of the scaling factor.
The formulas in the theorem above are obtained from properties of the
integral. For example, (4.27) results from the computation
Z ∞ Z ∞
−iξt −iξt0
ĝ(ξ) = f (t − t0 )e dt = e f (t − t0 )e−iξ(t−t0 ) dt
−∞ −∞
Z ∞
−iξt0 −iξs −iξt0 ˆ
=e f (s)e ds = e f (ξ) ,
−∞
and (4.29) from the computation

Z ∞ Z ∞
−iξt 1 1 ˆ ξ
ĝ(ξ) = f (at)e dt = f (u)e−iuξ/a du = f .
−∞ −∞ |a| |a| a
Remark 52. The properties given in Theorem 46 can also be written in
operator form. For example, the time shift can be expressed by the transla-
tion operator Tt0 which maps a function f to its translate Tt0 f defined by
(Tt0 f )(t) = f (t − t0 ). Equation (4.27) then takes the form
−ξt0 ˆ
Td
t0 f (ξ) = e f (ξ) .
Theorem 47. (a) Suppose that f is continuous, f 0 is piecewise continuous
and both f and f 0 are integrable on R. Then
[0 ](ξ) = iξ fˆ(ξ) .
fb0 (ξ) = F[f (4.30)
R∞ R∞
(b) Suppose that f satisfies −∞ |f (t)| dt < ∞ and −∞ |tf (t)| dt < ∞. Then
(see Remark 51 for the notation)
d ˆ
F[tf (t)] = i f (ξ) . (4.31)
dξ
Remark 53. If we apply Theorem 47 to derivatives of f , we obtain the

formulas
(n) (ξ) = F[f (n) ](ξ) = (iξ)n fˆ(ξ) ,
fd (4.32)
dn
F[tn f (t)] = in n fˆ(ξ) , (4.33)
dξ
provided f and its derivatives up to order n satisfy the corresponding assump-
tions. In particular, for n = 2 we have
d2 ˆ
F[t2 f (t)] = − f (ξ) . (4.34)
dξ 2
Theorem 48. (Plancherel’s identity)

If f is integrable as well as square integrable on R, and if the same holds for
g, then Z ∞ Z ∞
fˆ(ξ)ĝ(ξ) dξ = 2π f (t)g(t) dt . (4.35)
−∞ −∞
(Here c denotes the complex conjugate of the complex number c.)
Setting g = f in the preceding theorem, we obtain the following.

Theorem 49. (Parseval’s identity)
If f and f 2 are integrable on R, then
Z ∞ Z ∞
ˆ 2
|f (ξ)| dξ = 2π |f (t)|2 dt . (4.36)
−∞ −∞
If we interpret f as a signal, the norm

Z ∞ 1/2
kf k = |f (t)|2 dt
−∞
represents the energy of the signal.

Inverse Fourier Transform
In the beginning of this section, we defined the Fourier transform fˆ of a
function f . We can reverse this procedure if we know fˆ and we can obtain f
according to the following result.
Theorem 50. Suppose that f is continuous and that f and fˆ are integrable
on R. Then Z ∞
1
f (t) = fˆ(ξ)eiξt dξ (4.37)
2π −∞
holds for all t ∈ R.
In abstract terms, the right hand side of (4.37) defines the inverse F −1 of
the Fourier transform F. It is called the inverse Fourier transform
Z ∞
−1 1
(F [g])(t) = g(ξ)eiξt dξ . (4.38)
2π −∞
Indeed, we see that F −1 [F[f ]] = f .

Remark 54. The following interpretation of Theorem 50 is fundamental for
many applications of the Fourier transform. Consider t as a time variable. For
fixed ξ, the values eiξt traverse along the unit circle at constant speed. Since
ξ = 2π corresponds to the completion of one cycle in one unit of time, the
ξ/2π gives the number of cycles per unit time, which is called the frequency. Its
unit is Hertz if t is measured in seconds. The ξ is called the angular frequency
and it gives the number of radians traversed per unit time. Seen in this light,
formula (4.37) is a decomposition of the original function f into a weighted
sum of oscillations in form of an integral. The weight of the angular frequency
ξ is given by the value fˆ(ξ) of the Fourier transform of f .
Remark 55. If f is twice differentiable and if f , f 0 and f 00 are integrable on
R, then fˆ is integrable on R, so in this case we can apply Theorem 50.
For piecewise continuous functions, one has the following result.
Theorem 51. Let f and f 0 be piecewise continuous and assume that f is
integrable on R. Then
Z R
1 1
lim fˆ(ξ)eiξt dξ = (f (t+) − f (t−)) (4.39)
R→∞ 2π −R 2
holds for all t ∈ R.
Remark 56. For a function h, the limit
Z R
lim h(x) dx ,
R→∞ −R
if
R ∞it exists, is called the principal value (or Cauchy principal value) of
−∞
h(x) dx. Thus, under the assumptions of Theorem 51 we also obtain the
formula Z ∞
1
f (t) = fˆ(ξ)eiξt dξ
2π −∞
at points t where f is continuous, provided we interpret the integral as its
principal value.
Remark 57. If one wants the frequency variable ξ to denote ordinary fre-
quency instead of angular frequency, one defines the Fourier transform by
Z ∞
fˆ(ξ) = f (t)e−2πitξ dt .
−∞
The inverse formula then becomes

Z ∞
f (t) = fˆ(ξ)e2πitξ dξ .
−∞
In this case, the frequency ξ is measured in Hertz (cycles per second), if t is

measured in seconds. If one keeps the angular frequency, but wants a more
symmetric relation between the transform and its inverse, one uses
Z ∞ Z ∞
ˆ 1 1
f (ξ) = √ f (t)e −itξ
dt , f (t) = √ fˆ(ξ)eitξ dξ .
2π −∞ 2π −∞
Less common is an interchange of the sign in the exponent,
Z ∞ Z ∞
1
fˆ(ξ) = f (t)eitξ dt , f (t) = f (t)e−itξ dξ .
−∞ 2π −∞
It is also possible to mix these variants. Therefore, when dealing with the
Fourier transform, one has to make sure which convention is used.
Localization and Uncertainty Principle
In this subsection we explain the fact that a function f and its Fourier
transform fˆ cannot both be concentrated on a small interval.
First, consider the dilation g(t) = f (at). For a > 1, g represents a com-
pression of f around t = 0 by the factor a. On the other hand, Theorem 46(c)
says that
1 ˆ ξ
ĝ(ξ) = f ,
|a| a
that is, we have to stretch fˆ by a factor a to obtain ĝ. If a < 1, g is a stretched
version of f while ĝ is a compressed version of fˆ.
Second, assume that fˆ(ξ) = 0 outside some interval [−l, l]. We say in
this case that f has bandwidth l. From the Fourier inversion formula which
becomes Z l
1
f (t) = fˆ(ξ)eiξt dξ
2π −l
one concludes, with the aid of a result of complex function theory which we
cannot present here, that f can be zero only at isolated points, so it spreads
out to infinity and in particular cannot have the property that f (t) = 0 outside
some interval [−M, M ].
Third, one can quantify this phenomenon. Consider the expression
Z ∞ Z ∞ −1
2 2
∆f = t |f (t)| dt · |f (t)|2 dt .
−∞ −∞
If the large values of f arise only at small values of t and f decays rapidly as t
gets large, the numerator will be small in comparison with the denominator, so
∆f somehow measures the concentration (or localization) of f around t = 0.

It can be proved that
1
(∆f ) · (∆fˆ) ≥ ,
4
that is, if ∆f is small then ∆fˆ has to be large and vice versa. This is called
the uncertainty principle.
4.3.2 Convolution
The convolution of two sequences a = {an } and b = {bn }, where n ranges
over all integers, is defined as
∞
X
(a ∗ b)k = aj bk−j ,
j=−∞
provided the infinite series converges. If aj and bj are non-zero only for j ≥ 0,
the convolution becomes the finite sum
k
X
(a ∗ b)k = aj bk−j .
j=0
For example, for k = 2 and k = 3 we have

2
X
(a ∗ b)2 = aj b2−j = (a0 b2 + a1 b1 + a2 b0 )
j=0
3
X
(a ∗ b)3 = aj b3−j = (a0 b3 + a1 b2 + a2 b1 + a3 b0 ) .
j=0
For functions, convolution involves the integral instead of the sum.

Definition 81. Let f and g be two functions defined on the real line R and
assume that f and g are integrable on R. The convolution of f and g is
denoted by f ∗ g and defined as
Z ∞
(f ∗ g)(x) = f (x − y)g(y) dy (4.40)
−∞
(read as f star g, or f convolved with g).

One can prove that, under the stated assumptions, the improper integral
in (4.40) indeed converges (we will not do it here), so that f ∗ g is integrable
on R, too.
Remark 58. (a) The convolution has the following properties:
(i) For all functions f, g, h as in Definition 81,
(f + g) ∗ h = (f ∗ h) + (g ∗ h) ,
that is, [(f + g) ∗ h](x) = (f ∗ h)(x) + (g ∗ h)(x) for all x ∈ R.

(ii) (λf ) ∗ g = λ(f ∗ g) for functions f, g and scalars λ.
(iii) f ∗ (g + h) = (f ∗ g) + (f ∗ h).
(iv) f ∗ (λg) = λ(f ∗ g).
(v) f ∗ (g ∗ h)] = (f ∗ g) ∗ h.
(vi) f ∗ g = g ∗ f .
These properties tell us that the convolution f ∗ g is linear with respect to f
and g separately, and that it is a commutative (property (vi)).
(c) The convolution can also be interpreted as a moving weight average of the
function f , where the weighting is determined by the function g. In view of
(a) (vi), f ∗ g can also be interpreted as a moving weight average of g, where
the weighting is determined by f .
(d) If the function f (x) has large oscillations, sharp peaks or discontinuities,
then averaging about each point x will tend to decrease the oscillations, lower
the peaks and smooth out the discontinuities. In view of all this, convolution
acts as a smoothing operator. Let us mention two particular results in this
direction.
R∞
(i) If supt∈R |f (t)| and |g(t)| dt are finite, then the function f ∗ g is
−∞
continuous on R.
R∞ R∞
(ii) If −∞ |f (t)|2 dt and −∞ |g(t)|2 dt are finite, then the function f ∗ g is
continuous on R.
(e) Convolutions arise as basic tools to describe input-output systems. Such
a system transforms a time-dependent input function u = u(t) into a time-
dependent output function w = w(t) according to
Z ∞
w(t) = (f ∗ u)(t) = f (t − s)u(s) ds . (4.41)
−∞
In signal analysis, such an input-output system is called a filter. For example,

an electrical circuit with input and output lines can be described in this
way, and indeed much of mathematical systems theory has been developed
in this context. A filter may serve various purposes such as letting through
certain frequencies while blocking other ones and removing noise or blurring
in pictures.
We may write the system (4.41) in operator form as
w = S[u] .
The system is linear,

S[αu + βv] = αS[u] + βS[v] ,
and it is time-invariant, that is, if ũ(t) = u(t − h) is a translate of u, then
w̃ = S[ũ] satisfies w̃(t) = w(t − h), that is, w̃ is the corresponding translate
of w. This means that the behavior of the system does not change when time
passes.
We state some important properties of the convolution.
Theorem 52. Suppose that f and g are integrable on R. Then
Z ∞ Z ∞ Z ∞
|(f ∗ g)(t)| dt ≤ |f (t)| dt · |g(t)| dt . (4.42)
−∞ −∞ −∞
For convolution in the time domain, we have
∗ g(ξ) = fˆ(ξ)ĝ(ξ) .
f[ (4.43)
For convolution in the frequency domain, we have
1 ˆ
fcg(ξ) = (f ∗ ĝ)(ξ) . (4.44)
2π
Thus, under the action of the Fourier transform or its inverse, multiplica-
tion becomes convolution and vice versa. This is a major reason why convo-
lution plays a prominent role in the calculus of Fourier transforms.
4.3.3 Discrete Fourier Transform

In digital signal processing, signals are represented by sequences {xn }, also
written as {x(n)}, where n ranges over all integers. In other words, we consider
functions x whose domain are the integers, instead of functions defined on the
real numbers R.
More specifically, let x be a periodic sequence with period N , that is,
x(n + N ) = x(n) for all n. Any such sequence is completely specified by the
values x(0), x(1), . . . , x(N − 1). The (N -point) discrete Fourier transform
(DFT) of x, denoted by x̂, is the N -periodic sequence defined by
N
X −1
x̂(n) = x(j)e−2πijn/N , 0 ≤ n ≤ N − 1, (4.45)
j=0
and extended by periodicity to all integer values of n.

The following theorem yields the inverse of the discrete Fourier transform.
Theorem 53. Let x be an N -periodic sequence x(n) with DFT x̂. Then
N −1
1 X
x(j) = x̂(n)e2πinj/N , 0 ≤ j < N − 1. (4.46)
N n=0
We define the convolution of N -periodic sequences.

Definition 82. Let x and y be N -periodic sequences. The circular convolution
of x and y is defined by
N
X −1
(x ∗ y)(n) = x(k)y(n − k) . (4.47)
k=0
One may check immediately from (4.47) that x ∗ y is also N -periodic.

Theorem 54. Let x and y be N -periodic sequences with DFT’s x̂ and ŷ. Then
∗ y(n) = x̂(n)ŷ(n) ,
xd (4.48)
∗ y denotes the DFT of x ∗ y.

where xd
If one computes the discrete Fourier transform of an N -periodic sequence
directly from its definition, one needs N multiplications (and N additions) for
each element x̂(n), thus N 2 multiplications for all elements x(0),. . . ,x(N − 1).
However, due to specific properties of the factors e−2πijn/N , it is possible to
compute the DFT with only cN log2 N multiplications and additions, where
c is a small constant.
An algorithm for this purpose was discovered by James Cooley and John
Tukey, published in 1965, and is known as the fast Fourier transform
(FFT). Its history goes back to Carl Friedrich Gauss. The algorithm is based
on the recursive factorization of a particular matrix. When N is large, the
speedup from N 2 to cN log2 N is enormous. Indeed, the FFT is most widely
used in computations involving the Fourier transform in all areas of science
and technology.
4.4 Integral Transforms

It is well known that various functions can be defined and expressed as
improper integrals of the form
Z ∞
F (y) = K(x, y)f (x)dx.
−∞
A function defined in this manner is called an integral transform of f . The

function (K · ·) appearing under the integral sign is called the kernal of the
transform. We have already discussed in Section 2.7 and in Section 4.3, the
Laplace transform and the Fourier transform. Here we introduce Fourier sine
and Fourier cosine transforms and Hankel transform.
Integral transforms are used very extensively in both pure and applied
mathematics, as well as in science and engineering. They are especially use-

ful in solving certain boundary value problems, partial differential equations
and some types of integral equations. An integral transform is a linear and
invertible transformation, and a partial differential equation can be reduced
to a system of algebraic equations by application of an integral transform.
The algebraic problem is easy to solve for the transformed function and the
function f can be recovered from F by some inversion formula.
1. The Fourier cosine transform Fc {f } = fˆc of a function f is defined
by Z ∞
Fc {f }(w) = fˆc (w) = cos(wx)f (x)dx.
0
2. The Fourier sine transform Fs {f } = fˆs of a function f is defined by

Z ∞
ˆ
Fs {f }(w) = fs (w) = sin (wx)f (x)dx.
0
3. The Hankel transform Hn {f } of order n of a function f is defined by

Z ∞
(Hn {f })(s) = xf (x)Jn (sx)dx.
0
where Jn is the Bessel function of order n of the first kind.
4.5 Sturm-Liouville Problems

We have seen that the trigonometric functions sine and cosine can be
used to represent functions in the form of Fourier series expansions. Now
we will generalize these ideas. The methods developed here will generally
produce solutions of various boundary value problems in the form of infinite
function series. Technical questions and issues, such as convergence, termwise
differentiation and integration and uniqueness, will not be discussed in detail
in this chapter. Interested readers may acquire these detail from advanced
literature on these topics, such as the book by Folland [3].
4.5.1 Regular Sturm-Liouville Problems

In mathematical physics and other disciplines fairly large numbers of prob-
lems are defined in the form of boundary value problems involving second or-
der ordinary differential equations. Therefore, let us consider the differential
equation.
a(x)y 00 (x) + b(x)y 0 (x) + [c(x) + λr(x)]y = 0, a < x ≤ b (4.49)
subject to some boundary conditions on a bounded interval [a, b]. We suppose

that the real functions a(x), r(x) are continuous on the interval [a, b], λ a
parameter, and we suppose that a(x) is not zero for every x ∈ [a, b]. It is much
more convenient to rewrite the differential equation (4.49) in its equivalent
form, the so-called Sturm-Liouville form
(p(x)y 0 (x))0 + [q(x) + λr(x)]y = 0,
where the real functions p(x), p0 (x), r(x) are continuous on [a, b], and p(x)
and r(x) are positive on [a, b].
Remark 59. Any differential equation of the form (4.49) can be written in
Sturm- Liouville form.
Indeed, first divide (4.49) by a(x) to obtain

00 b(x) 0 c(x) r(x)
y (x) + y (x) + +λ y = 0.
a(x) a(x) a(x)
Definition 83. A regular Sturm-Liouville problem is a second order homo-

geneous linear differential equation of the form
p(x)y 0 (x) + [q(x) + λr(x)]y(x) = 0, a < x < b (4.50)

where p(x), p0 (x) and r(x) are real continuous functions on a finite inter-
val [a, b] and p(x), r(x) > 0 on [a, b], together with the set of homogeneous
boundary conditions of the form
α1 y(a) + β1 y 0 (a) = 0,
α1 y(b) + β1 y 0 (b) = 0, (4.51)
where α1 and β1 are constants. We regard λ as an undermined constant pa-

rameter.
Notation. Sometimes, we will denote by L the following linear differential
operator.
L[y] = (p(x)y(x)) + q(x)y(x). (4.52)
Using this operator the differential equation (4.50) can be expressed in the
following form
L[y] = −ry. (4.53)
We use the term linear differential operator because of the following important
linear property of the operator L.
L[c1 y1 + c2 y2 ] = c1 L[y1 ] + c2 L[y2 ],
for any constants c1 and c2 and any differentiable functions y1 and y2 .

Remark 60. For different values of the constants α’s and β’s in the boundary
conditions (4.51) we have special types of boundary conditions.
For β1 = β2 = 0, (4.51) these are called Dirichlet boundary conditions.
For α1 = α2 = 0, (4.51) these are called Neumann boundary conditions.
Other types of boundary conditions that are often encountered are periodic
boundary conditions when
y(a) = y(b), y 0 (a) = y 0 (b).
Our goal is to find all solutions of the Sturm-Liouville problem. It is clear

that y ≡ 0 is the trivial solution of (4.50) for every λ, and satisfies the bound-
ary conditions (4.51). However, we are interested here to find those parameters
λ for which the Sturm-Liouville problem has non-trivial solutions y. Those
values of λ and the corresponding solutions y(x) have special names:
Definition 84. If y(x) 6= 0 is a solution of a regular Sturm-Liouville problem
(4.50), (4.51) corresponding to some constant λ, then this solution is called
an eigenfunction corresponding (associated) to the eigenvalue λ.
Let us take now an example.
Example 228. Find all eigenvalues and the corresponding eigenfunctions of
the following problem.
y 00 (x) + λy(x) = 0, 0 < x < l
subject to be boundary conditions y(0) = 0 and y(l) = 0.

Solution: First we observe that this is a regular Sturm-Liouville problem.
Indeed, the differential equation can be written in the form
(1.y 0 (x))0 + [0 + 1.]y = 0,
so p(x) = 1, q(x) = 0 and r(x) = 1.

Case 1: λ = 0. is not an eigenvalue.
Case 2: λ < 0. It can be verified that this λ is not an eigenvalue.
Case 3: λ > 0. The eigenvalues for this problem are
n2 π 2
λn = n = 1, 2, 3, . . . .
l2
The corresponding eigenfunctions are
nπx
yn (x) = sin , n = 1, 2, 3, ....
l
It can be proved in general that every eigenvalue of the regular Sturm-
Liouville problem is a real number.
Properties of the regular Sturm-Liouville problem are:

(a) There exist an infinite number of real eigenvalues that can be arranged
in increasing order λ1 < λ2 < λ3 < · · · < λn < · · · such that λn → ∞
as n → ∞.
(b) For each eigenvalue there is only one eigenfunction (except for constant
multiples).
(c) Eigenfunctions corresponding to different eigenvalues are linearly inde-
pendent.
(d) The set of eigenfunctions corresponding to the set of eigenvalues is or-
thogonal with respect to the weight function p(x) on the interval [a, b].
4.6 Application of Fourier Methods to Signal Analysis

Signals are ubiquitous. They are synonymous functions. A square inte-
grable function (Section 4.3.1) are also called a signal with finite energy. A
signal is characterized by its energy defined as
s
Z b
|f (t)|2 dt = kf k.
a
The train’s whistle and the blinking of a car’s beam can be viewed as quan-
tities varying in time which contain information; they are examples of time
dependent signals. Early in human history, signals of smoke by day and of fire
by night have been used to transmit information. In recent times, telegraph,
telephone, radio, television and radar have been used as signal transmitters.
A radio signal consists of a sine or cosine wave with radio frequency, called
the carrier wave, which has been modulated with the information to be trans-
mitted.
Signals can be divided into two categories, analog signals (functions de-
fined on a continuum of numbers, for example an interval in R) and digital
signals, which are defined on a discrete set like the integers. In 1949, Claude
E. Shannon of Bell Telephone Laboratories published a mathematical result
now known as the Shannon sampling theorem. This result provided the foun-
dation for digital signal processing. It tells us that if the range of frequencies
of a signal measured in cycles per second does not exceed n, then the time-
continuous signal can be reconstructed with complete accuracy by measuring
its amplitude 2n times a second.
The study of signals is relevant not only in telecommunication, but also
in telemetry, astronomy, oceanography, optics, crystallography, geophysics,
bioengineering, bioinformatics and medicine.
Shannon sampling theorem

Let us assume that a signal f is band-limited, that is,
fˆ(ξ) = 0 , for all |ξ| > l, (4.54)
holds for some l > 0, and the smallest such number l is called the bandwidth
of the signal. This means that the total frequency content of the signal f lies
in the band (or interval) [−l, l]. Moreover, let us assume that f is integrable
and square integrable on R, that is, the signal f has finite energy. Let it be
recovered from its Fourier transform as
Z ∞
1
f (t) = fˆ(ξ)eiξt dξ .
2π −∞
Because of (4.54),
Z l
1
f (t) = fˆ(ξ)eiξt dξ . (4.55)
2π −l
We expand fˆ on [−l, l] in a complex Fourier series, so

∞
X
fˆ(ξ) = cn enπiξ/l , (4.56)
n=−∞
where Z l
1
cn = fˆ(ξ)e−nπiξ/l dξ . (4.57)
2l −l
We insert (4.56) in (4.55) and obtain

Z l Z l X ∞
1 1
f (t) = fˆ(ξ)eiξt dξ = cn enπiξ/l eiξt dξ
2π −l 2π −l n=−∞
∞ Z l (4.58)
1 X
= cn enπiξ/l eiξt dξ .
2π n=−∞ −l
(Interchanging the integral with the sum is possible by a general property of

Fourier series of functions of finite energy. Note that fˆ has finite energy by
Parseval’s formula (4.36).) Next, we compare (4.55) for t = −nπ/l with (4.57)
and see that
π nπ
cn = f − . (4.59)
l l
We insert this value into (4.58) and compute
∞
1 X π nπ l nπiξ/l iξt
Z
f (t) = f − e e dξ
2π n=−∞ l l −l
∞
1 X nπ l −nπiξ/l iξt
Z
= f e e dξ
2l n=−∞ l −l
(we have replaced n by −n, as n ranges over all integers)

∞
1 X nπ l iξ(t−nπ/l)
Z
= f e dξ
2l n=−∞ l −l
∞ iξ=l
1 X nπ 1 h
= f eiξ(t−nπ/l
2l n=−∞ l i(t − nπ/l) ξ=−l
∞ nπ
X 1 1 i(lt−nπ)
= f [e − e−i(lt−nπ) ] ,
n=−∞
l lt − nπ 2i
so finally we arrive at the Whittaker-Shannon interpolation formula

∞
X nπ sin(lt − nπ)
f (t) = f . (4.60)
n=−∞
l lt − nπ
This is the main content of Shannon’s sampling theorem which states that a
function of bandwidth l can be completely recovered by (4.60) from its values
at the points nπ/l, where n = 0, ±1, ±2, . . . . This result forms the basis for the
conversion between analog and digital signals. If we convert an analog signal f
of bandwidth l into a digital signal by evaluating it at times t = 0, ±π/l, ±2π/l,
we can convert it back to an analog signal without loss of information, at least
in principle. Note that an exact evaluation of (4.60)) involves an infinite sum
of values of f at arbitrarily large positive and negative times. Developing
suitable approximations, both from the theoretical and practical standpoints,
is one of the areas of signal analysis and signal processing.
4.7 Exercises
4.1. Discuss the relationship between linear independence and orthonormal-
ity. Can you convert an orthogonal system into an orthonormal system?
x
4.2. (a) Show
that
f (x) = e and g(x) = sin x are orthogonal on the interval
π 5π
, .
4 4
(b) Show that {cos x, cos 3x, cos 5x, . . .} is an orthogonal set on [0, π/2].
4.3. Expand the following functions into Fourier series on the given interval.
(a) f (x) = x + π, −π < x < π.
(b) f (x) = e−8x for −4 ≤ x ≤ 4.

0, −π < x < 0
(c) f (x) = .
x2 , 0 ≤ x < π
4.4. Show that

π 1 1 1
(a) = 1 − + − + ... (using 10.8.3 (a)),
4 3 5 7
π2 1 1
(b) = 1 + 2 + 2 + ... (using 10.8.3 (c)).
6 2 3
4.5. Compare the graph of the function f (x) = x2 with the 3rd , 6th , 10th
and 13th partial sums of its Fourier series on the interval [−2, 2].
4.6. Write the complex Fourier series of the following functions.
(a) f (x) = cos x, 0 ≤ x < 1, and f has period 1.

(
0, 0 ≤ x < 1,
(b) f has period 4, and f (x) =
1, 1 ≤ x < 4.
4.7. Expand the following functions in an appropriate cosine and sine Fourier
series.
(a) f (x) = x3 , −π < x < π.
(
x − 1, −π < x < 0 ,
(b) f (x) =
x + 1, 0 ≤ x < π .
4.8. Let f (x) = 4 sin x, 0 < x < π, f (x + π) = f (x). Sketch this function and
its Fourier series. Find the frequency spectrum of f .
4.9. Let f be integrable on [−l, l].
(a) Prove that

∞
1 l
Z
1 X
2 2
a0 + (an + bn ) ≤ |f (x)|2 dx ,
2 n=1
l −l
where a0 , an , bn are the Fourier coefficients of f .

(b) Prove Parseval’s identity
∞
1 l
Z
1 X
a0 + (a2n + b2n ) = |f (x)|2 dx
2 n=1
l −l
Z l
if |f (x)|2 dx is finite and a0 , an and bn are its Fourier coeffi-
−l
cients.
(c) Show that an → 0 and bn → 0 as n → ∞.
4.10. Find the Fourier transforms of the following functions
(a) f (t) = te−|t| for all real t.

(
sin (πt), −5 ≤ t ≤ 5
(b) f (t) =
0, |t| > 5

1,
 0≤t≤k
(c) f (t) = −1, −k ≤ t < 0 , , where k is a positive constant.

0, |t| > k

4.11. Prove Theorem 46 (b): Let ξ0 be a real number, let g(t) = eiξ0 t f (t),
then ĝ(ξ) = fˆ(ξ − ξ0 ).
4.12. Show that

(a) For g(t) = f (−t) show that ĝ(ξ) = fˆ(−ξ).
(b) For g(t) = fˆ(t) show that ĝ(ξ) = 2πf (−ξ).
4.13. Prove that (f ∗ g)(x) = (g ∗ f )(x) for all x.

4.14. Let f (x) = e−ax χ(0,∞) (x) and g(x) = e−bx χ(0,∞) (x), where
(
1, x ∈ (0, ∞)
χ(0,∞) (x) =
0, x 6∈ (0, ∞)
Calculate (f ∗ g)(x).
4.15. Find fˆ and ĝ if

(
1, −1/2 ≤ t ≤ 1/2
f (t) = χ[−1/2,1/2] (t) =
0, t < −1/2 or t > 1/2
4.16. Let f and g be square integrable. Show that the convolution f ∗ g is a

continuous function on R.
4.17. Let x = x(n) and y = y(n) be N -periodic signals. Prove that (x∗y)(n) =
(y ∗ x)(n).

For applications of Fourier Methods to sound, music, computers, X-ray
crystallography, computerized tomography, nuclear magnetic resonance and
radiostronomy, we refer to [2] and [4].
We suggest reading of relevant Chapter 3 of [1] and Chapter 12, Sec-

tion 12.5 of [5] and [3] on the Sturm-Liovville problem. For acquiring a good
knowledge of Haar system, Walsh system and wavelets constructed by Walsh
systems, we suggest a forthcoming book on the theme wavelets constructed
through Walsh functions by Farkov, Manchanda and Siddiqi Springer series
due for publication in Dec 2017.
Bibliography
[1] K. Adzievski, A. H. Siddiqi, Introduction to Partial Differential Equations

for Scientists and Engineers Using Mathematica, CRC Press, 2014.
[2] M. Cartwright, Fourier Methods for Mathematicians, Scientists and En-

gineers, Ellis Horwood Series in Mathematics Applications, New York,
1990.
[3] G. B. Folland, Fourier Series and Its Applications, Wadsworth, Pacific
Grove, CA, 1992.
[4] E Prestini, The Evolution of Applied Harmonic Analysis Models of the
Real World, Birkhäuser, 2004.
[5] D. G. Zill, Michael R. Cullen, Advanced Engineering Mathematics, Jones
and Bartlett Publishers, 2012.
387
Chapter 5
Applied Partial Differential
Equations
5.1 Introduction to Partial Differential Equations . . . . . . . . . . . . . . . . . . . 389

5.2 Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
5.2.1 Solution Using Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
5.2.2 Solution Using Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 397
5.2.3 Solution Using Laplace Transform . . . . . . . . . . . . . . . . . . . . . . 398
5.3 Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
5.3.3 Solution Using Laplace Transform . . . . . . . . . . . . . . . . . . . . . . 403
5.4 Laplace Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
5.4.3 Solution Using Laplace transform . . . . . . . . . . . . . . . . . . . . . . . 412
5.5 Poisson Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
5.6 Simulation of Heat Equation, Wave Equation and Laplace
Equation by MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
5.7 Solving Partial Differential Equation by MATHEMATICA . . . . . 430
5.8 Practical Applications in Physics and Mechanics . . . . . . . . . . . . . . . 433
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
5.1 Introduction to Partial Differential Equations

Often real world problems are described by partial differential equations
(PDEs) with or without boundary and initial value conditions. PDEs involve
unknown functions of two or more variables and their partial derivatives. There
is a wide class of PDEs which play a significant role in electromagnetic theory,
fluid dynamics, traffic flow, medical imaging, financial engineering and many
other disciplines. In this chapter we focus on basic ingredients of PDEs and
applications of important classes such as the wave equation, heat equa-
tion, and Laplace equation.
389
Basic concepts and terminology

Let Ω be a nonempty subset of R2 or in R3 which is open and connected. It
may be recalled that Ω is called open if at every point P ∈ Ω, there is an open
disc or ball (without circumference or surface) with center P which is subset
of Ω. Note that Ω is called connected if any two points in Ω can be joined by
a polygonal line which lies entirely in Ω. If u = u(x, y, ...), (x, y, ...) ∈ Ω is a
function of two or more variables, then the partial derivatives of u of the first
order will be denoted by
∂u ∂u
or ux , or uy .
∂x ∂y
Similarly, for partial derivatives of the second order, we will use the notation
∂2u ∂2u ∂2u
, , .
∂x2 ∂y∂x ∂y 2
The class C k (Ω), k = 1, 2, ... is defined as the set of functions u = u(x, y),
(x, y) ∈ Ω. All its partial derivatives up to the order of k are continuous
functions in Ω. These concepts can be generalized for more variables.
A partial differential equation for a function u = u(x, y, z, . . . ) contains
the function u and some of its partial derivatives:
F (x, y, u, ux , uy , uxx , uxy , uyy , ...) = 0. (5.1)
A solution of a partial differential equation of order k is a function u =
u(x, y, z, . . . ) ∈ C k (Ω) which together with partial derivatives satisfies the
PDE.
Examples of PDEs and their solutions
Example 229. Show that u(x, y) = ln(x2 + y 2 ) is a solution of the PDE
uxx + uyy = 0
known as the Laplace equation.
Solution:
2x −2x2 + 2y 2
ux = , uxx =
x2 + y 2 (x2 + y 2 )2
2y −2y 2 + 2x2
uy = , uyy = .
x2 + y 2 (x2 + y 2 )2
It is clear that
uxx + uyy = 0.
Example 230. Show that u(x, t) = sin x cos t is a solution of the PDE
uxx = utt .
This type of PDE is known as a one-dimensional wave equation.
Applied Partial Differential Equations 391
Solution:
ux = cos x cos t, uxx = − sin x cos t,
ut = − sin x sin t, utt = − sin x cos t,
and it is clear that
uxx = utt .
1 x2
Example 231. Show that u(x, t) = t− 2 e− t is a solution of the PDE
1
ut = uxx .
4
This type of equation is known as a one-dimensional heat equation or diffusion
equation.
Solution:
−1 −3 − x2 −5 x2
ut = t 2 e t + t 2 x2 e− t
2
−3 x2
ux = −2xt 2 e− t
x2 −5 x2
uxx = −2te− t + 4t 2 x2 e− t
so, clearly,
1
ut = uxx .
4
Example 232. Find the general solution of the PDE
uxx (x, y) = 0.
Solution: This equation means that ux (x, y) does not depend on x. Therefore,
ux (x, y) = A(y), where A(y) is an arbitrary function. Integrating the last
equation with respect to x (keeping y constant), it follows that the general
solution of the equation is given by
u(x, y) = A(y)x + B(y)
where A(y) and B(y) are arbitrary functions.

Example 233. Find the general solution of the PDE
uyx + uy = 0.
Solution: In order to get the general solution, we convert this PDE into an
ordinary differential equation (ODE) by the substitution w = w(x, y) = uy .
Then the given equation becomes
wx + w = 0.
The general solution of the above ODE is given by
w(x, y) = a(y)e−x
where a(y) is any differentiable function. Therefore,
uy = a(y)e−x .
Integrating this equation with respect to y (keeping x constant), the general

solution of the given PDE is
u(x, y) = e−x f (y) + g(x)
df
where f is any differentiable function so that = a(y) and g is any arbitrary
dy
function.
Boundary and initial conditions
Boundaries are a set of constraints that describe the nature of unknown func-
tion u(x, y) satisfying the PDE, on the boundary of the domain Ω, denoted
by ∂Ω. There are three important classes of boundary conditions:
1. Dirichlet conditions
u(x, y) = f (x, y) for x, y ∈ ∂Ω.
2. Neumann conditions
∂u
(x, y) = h(x, y) for x, y ∈ ∂Ω
∂n
∂u
where denotes the derivative of u in the direction of the outward unit
∂n
normal vector to ∂Ω.
3. Robin mixed conditions
∂u
αu(x, y) + β (x, y) = l(x, y) for x, y ∈ ∂Ω
∂n
where α, β are nonzero constants or functions, and l(x, y) is a function defined
on ∂Ω.
Initial conditions
When PDEs involve the time variable t, then we have to consider initial con-
ditions called Cauchy conditions. These conditions specify the value of the
unknown function and its higher order derivative at the initial time t = t0 .
If the functions f (x, y), h(x, y), and l(x, y) are identically zero on their
domain ∂Ω, they are called homogeneous boundary conditions; otherwise they
are called non-homogeneous conditions.
A differential equation is said to be linear if the function F is algebraically
linear in each of the variables u, ux , uy , uxx , uxy , uyy , ..., and if the coefficients
of u and its derivatives are functions only of the independent variables. An
equation that is not linear is said to be non-linear; a non-linear equation is
quasi-linear if it is linear in the highest order derivatives.
For example, an equation of the form
a(x, y)ux + b(x, y)uy + c(x, y)u(x, y) = 0
is a first order linear equation. The following equation
a(x, y, u)ux + b(x, y, u)uy + c(x, y, u)u(x, y) = 0
is a first order quasi-linear equation.
Classification of linear second order partial differential equations

The second order PDE
auxx + buxy + cuyy + dux + euy + f u = 0
where a, b, c, d, e, f are real constants, is called
Elliptic if b2 − 4ac < 0
Parabolic if b2 − 4ac = 0
Hyperbolic if b2 − 4ac > 0
Example 234. Classify the following PDEs

(i) 14 uxx = ut (ii) uxx = uyy (iii) uxx + uyy = 0 (iv) uxx − uxy − 3uyy = 0.
Solution: (i) 14 uxx − ut = 0. a = 14 , b = 0, c = 0, d = 0, e = −1, and f = 0.
b2 − 4ac = 0. Then the PDE is parabolic.
(ii) uxx − uyy = 0. a = 1, b = 0, c = −1, d = 0, e = f = 0. b2 − 4ac = 4 > 0
and the PDE is hyperbolic.
(iii) uxx + uyy = 0. a = 1, b = 0, c = 1, d = 0, e = 0, f = 0. b2 − 4ac = −4 < 0.
So the PDE is elliptic.
(iv) uxx − uxy − 3uyy = 0. a = 1, b = −1, c = −3, d = 0, e = 0, f = 0.
b2 − 4ac = 13 > 0 and the equation is hyperbolic.
Some well-known linear PDEs
1. Laplace’s equation
uxx + uyy = 0
2. Helmholtz’s equation
−(uxx + uyy ) = λu
3. Heat (or diffusion) equation
ut = a2 uxx
4. Wave equation
utt = c2 uxx
5. Schrodinger equation
iut + uxx + uyy = 0
6. Telegraph equation
utt + dut − uxx = 0
7. Airy’s equation
ut + uxxx = 0
8. Beam equation
ut + uxxxx = 0
Well-known non-linear PDEs

1. Eikonal equation q
2 2
(ux ) + (uy ) = 1
2. Non-linear Poisson equation
−(uxx + uyy ) = f (u)
3. Scalar conservation law
ut + div F (u) = 0
4. Inviscid Burger’s equation
ut + uux = 0
5. Non-linear heat equation
ut − uxx = f (u)
6. Non-linear wave equation
utt − uxx = f (u)
7. Korteweg-deVries (KDV) equation
ut + uux + uxxx = 0
8. Maxwell’s equation
−1
div E = 0, curl E = Ht
c
1
div H = 0, curl H = Et
c
where E(x, y, z) and H = H(x, y, z) represent electric and magnetic field
in empty space, and c is the speed of the light.
9. Euler’s equations for incompressible, inviscid flow

ut + u.∇u = −∇p
div u = 0
10. Navier-Stokes equation for incompressible, inviscid flow

ut + u.∇u − 4u = −∇p
div u = 0
5.2 Heat Equation

Consider the homogeneous heat equation
ut = a2 uxx , 0 < x < l, t > 0 (5.2)

which satisfies the initial condition
u(x, 0) = f (x), 0 < x < l, (5.3)
and the homogeneous Dirichlet boundary conditions
u(0, t) = u(l, t) = 0, t > 0 (5.4)
where u(x, t) denotes the temperature distribution and k the thermal diffu-
sivity.
The equation, in its simplest form, goes back to the beginning of the 19th
century. Besides modelling temperature distribution it has been used to model
the following physical phenomena:
• Diffusion of one material within another such as smoke particles in air.
• Chemical reactions, such as the Belousov-Zhabotinsky reaction which
exhibits fascinating wave structure.
• Electrical activity in the membranes of living organisms, for example,
the Hodgkin-Huxley model.
• Population dispersions; individuals move randomly and to avoid over-

crowding.
• Pursuit and evasion in predator-prey systems.
• Pattern and evasion in predator-prey systems.
• Dispersion of pollutants in running streams.
More recently it has been used in financial mathematics or financial engi-
neering for determining appropriate prices of options.
5.2.1 Solution Using Fourier Series

We will solve the above initial boundary value problem using the Fourier
series method.
Let
u(x, t) = X(x)T (t) (5.5)
where X(x) and T (t) are functions of single variables x and t, respectively.
Differentiating (5.5) with respect to x and t, and substituting the partial
derivatives in Equation (5.2), we obtain
X 00 (x) 1T 0 (t)
= 2 . (5.6)
X(x) a T (t)
Equation (5.6) holds identically for every 0 < x < l and every t > 0. Since x
and t are independent variables, Equation (5.6) is possible only if each function
on both sides is equal to same constant λ :
X 00 (x) 1T 0 (t)
= 2 = λ.
X(x) a T (t)
This leads us to two ordinary differential equations:
X 00 (x) − λX(x) = 0 (5.7)
and
T 0 (t) − a2 λT (t) = 0. (5.8)
From the boundary conditions (7.75), we obtain
X(0)T (t) = X(l)T (t) = 0, t > 0.
This implies that
X(0) = X(l) = 0 (5.9)
because T (t) = 0 for every t > 0 would lead to the trivial solution u(x, t) = 0.
Solving the eigenvalue problem (5.7) and (5.9), as in Chapter 3, it follows that
the eigenvalues λn and the corresponding eigenfunctions Xn (x) are given by
nπ 2 nπx
λn = − , Xn (x) = sin , n = 1, 2, . . . . (5.10)
l l
The solution of the eigenvalue problem (5.8) corresponding to the above λn

is given by
n2 π 2 a2
Tn (t) = cn e− l2
t
, n = 1, 2, ... (5.11)
where cn are constants to be determined. Therefore, we obtain a sequence of
functions
n2 π 2 a2 nπx
un (x, t) = cn e− l2 t sin , n = 1, 2, ... (5.12)
l
each of which satisfies the heat equation (5.2) and the boundary conditions
(7.75). Since the heat equation and the boundary conditions are linear and
homogeneous, a function u(x, t) of the form
∞ ∞
X X n2 π 2 a2 nπx
u(x, t) = un (x, t) = cn e − l2
t
sin (5.13)
n=1 n=1
l
will also satisfy the heat equation and the boundary conditions. If we assume
that the above series is convergent and the initial condition (5.3) is satisfied,
we obtain
∞
X nπx
f (x) = u(x, 0) = cn sin , 0 < x < l.
n=1
l
Section 4.2.2 this shows that the above is the sine Fourier series of f (x).
Therefore, the coefficients cn are given by
Zl
2 nπx
cn = f (x) sin dx, n = 1, 2, . . . . (5.14)
l l
0
5.2.2 Solution Using Fourier Transform

We discussed the Fourier transform in Chapter 4. In this section we study
its application in solving initial value problems for the heat equation.
Notation. We denote by U (ω, t) the Fourier transform with respect to u(x, t).
Similarly, the Fourier transforms of y(x, t) and f (x, t) are respectively, Y (ω, t)
and F (ω, t).
Consider the homogeneous heat equation with initial condition
ut = a2 uxx , − ∞ < x < ∞, t > 0

(
. (5.15)
u(x, 0) = f (x), − ∞ < x < ∞
Keeping the above notations in mind, the problem (5.15) can be written in
the form of the following ordinary differential equation
dU
= −a2 ω 2 U , U (ω, 0) = F (ω)
dt
whose solution is 2
ω2 t
U (ω, t) = F (ω)e−a .
Now, from the inverse Fourier transform discussed in Chapter 4, it follows
that
Z∞ Z∞
1 1 2
ω 2 t iωx
u(x, t) = U (ω, t)e iωx
dω = F (ω)e−a e dω
2π 2π
−∞ −∞
Z∞ Z∞
 
1 2 2
ω 2 t iωx
=  f (ξ)e−iω ξ  e−a e dω
2π
−∞ −∞
Z∞ Z∞
 
1 2
ω 2 t iω(x−ξ)
=  e−a e dω  f (ξ)dξ
2π
−∞ −∞
Z∞
∞ 
Z
1 2 2
 e−a ω t cos (ω(x − ξ)) dω  f (ξ)dξ
=
π
−∞ 0
Z∞ (x−ξ)2
1
= √ e− 4a2 t f (ξ)dξ.
2c πt
−∞
In the above we have used the following result

Z∞ r
2 2 1 π
e−a ω t cos (ω(x − ξ)) dω = .
2c t
0
This result can be derived by differentiating the function

Z ∞
2 2
g(ω, λ) = e−c ω t cos(ωλ)dω.
0
5.2.3 Solution Using Laplace Transform

Let us first introduce the following notations for Laplace transforms
Lu(x, t) = U (x, s), Ly(x, t) = Y (x, s), Lf (x, t) = F (x, s).
In this section we solve the initial value and boundary value problems for the
heat equation using the Laplace transforms.

 ut = a2 uxx , 0 < x < 2, t > 0
u(0, t) = 0, u(2, t) = 1, t > 0 . (5.16)
u(x, 0) = 0, 0 < x < 2

Applying the Laplace transform to both sides of the heat equation and
using the initial condition, we have
d2 U
− sU = 0.
dx2
The general solution of this ordinary differential equation is
√ √
U (x, s) = c1 cosh( sx) + c2 sinh( sx).
Using the given boundary conditions, we get

1
U (0, s) = L(u(0, t)) = 0, U (1, s) = L(u(1, t)) = L(∞) = .
s
The above boundary conditions for U (x, s) imply
√ √ √
1 sinh( sx) 1 e(x−1) s − e−(x+1) s
U (x, s) = √ = √ .
s sinh( s) s 1 − e−2 s
Using the geometric series
∞ √
1 X
−2
√ =
s
e−2n s
1−e n=0
we obtain √ √ #
∞
"
X e−2(n+1−x) s e−2(n+1+x) s
U (x, s) = − .
n=0
s s
Therefore, from the Laplace transform (Table 2.2) we get
∞
" √ ! √ !#
X
−∞ e−2(n+1−x) s −∞ e−2(n+1+x) s
u(x, t) = L −L
n=0
s s
∞
X 2n + 1 − x 2n + 1 + x
= erf √ − erf √
n=0
2 t 2 t
where the error function erf (.) is defined by

Z∞
2 2
erf (x) = √ e−t dt.
π
0
5.3 Wave Equation

Consider the initial and boundary value problem for the wave equation
utt = c2 uxx , 0 < x < l, t > 0 (5.17)
u(0, t) = 0, u(l, t) = 0, t > 0 (5.18)

u(x, 0) = f (x), ut (x, 0) = g(x), 0 < x < l (5.19)
where u(x, t) represents the displacement, for example of a vibrating from its
equilibrium position, and c the wave speed.
Such equations have been used to model vibrating membrane, acoustic
problems to determine velocity potential of the fluid flow through which sound
can be transmitted, longitudinal vibrations of an elastic road or beam, and
electric and magnetic fields.

We look for a non-trivial solution
u(x, t) = X(x)T (t) (5.20)
where X(x) and T (t) are functions of single variables x and t, respectively.
Differentiating (5.20) with respect to x and t, and substituting the partial
derivatives in equation (5.17), we obtain
00
X 00 (x) 1T (t)
= 2 . (5.21)
X(x) c T (t)
Equation (5.21) holds identically for every 0 < x < l. Notice that the left
side of this equation is a function which depends only on x, while the right
hand side is a function of only t.Since x and t are independent variables, this
can happen only if each function in both sides of (5.21) is equal to the same
constant λ. 00
X 00 (x) 1T (t)
= 2 = λ.
X(x) c T (t)
From the last equation we obtain the two ordinary differential equations
X 00 (x) − λX(x) = 0 (5.22)
and
T 00 (t) − c2 λT (t) = 0. (5.23)
From the boundary conditions (5.18) it follows that
X(0)T (t) = X(l)T (t) = 0, t > 0.

This implies
X(0) = X(l) = 0
since T (t) = 0 would imply u(x, t) = 0. Solving the eigenvalue problem (5.22)
with the above boundary conditions just as in Chapter 4, we find that the
eigenvalues λn and corresponding eigenfunctions Xn (x) are given by
nπ 2 nπx
λn = − , Xn (x) = sin , n = 1, 2, . . . .
l l
The solution of the differential equation (5.23) corresponding to the above λn
is given by
nπct nπct
Tn (t) = an cos + bn sin , n = 1, 2, ...
l l
where an and bn are constants to be determined. Therefore, we obtain a se-
quence of functions
un (x, t) = Xn (x)Tn (t)
or
nπct nπct nπx
un (x, t) = an cos + bn sin sin , n = 1, 2, . . . ,
l l l
each of each satisfies the wave equation (5.17) and the boundary conditions
(5.18). Since the wave equation and the boundary conditions are linear and
homogeneous, a function u(x, t) of the form
∞ ∞
X X nπct nπct nπx
u(x, t) = un (x, t) = an cos + bn sin sin (5.24)
n=1 n=1
l l l
will also satisfy the wave equation and the boundary conditions. If we assume
that the above series is convergent and that it can be differentiated term by
term with respect to t, from (5.24) and the initial conditions (5.19), we obtain
∞
X nπx
f (x) = u(x, 0) = an sin ,0<x<l (5.25)
n=1
l
and
∞
X nπc nπx
g(x) = ut (x, 0) = bn sin , 0 < x < l. (5.26)
n=1
l l
Using the Fourier sine series (from Chapter 4) for the functions f (x) and g(x)
from (5.25) and (5.26), we obtain
Zl Zl
2 nπx 2 nπx
an = f (x) sin dx, bn = g(x) sin dx, n = 1, 2, . . . .
l l nπc l
0 0
A formal justification that the function u(x, t) is the solution of the wave
equation is given by Theorem 45 in Chapter 4.

Let us recall notations of Fourier transforms and a few important proper-
ties
Fu(x, t) = U (x, ω), Ff (x) = F (ω), Fg(x) = G(ω)
Z∞
F (u(x, t)) = U (x, ω) = u(x, t)e−iωx dx
−∞
F (ux (x, t)) = iωU (ω, t)

2
F (uxx (x, t)) = (iω) U (ω, t) = −ω 2 U (ω, t)
Z∞
−∞ 1
u(x, t) = F U (x, ω) = U (x, ω)eiωx dω
2π
−∞
d
F (ut (x, t)) = (F (u(x, t)))
dt
d2
F (utt (x, t)) = (F (u(x, t))) = Utt (ω, t).
dt2
Let us now solve the wave equation with initial conditions
utt = c2 uxx , − ∞ < x < ∞, t > 0 (5.27)
u(x, 0) = f (x), ut (x, 0) = g(x), t > 0 (5.28)

using the Fourier transform. Applying the Fourier transform F to the wave
equation, we get
F (utt (x, t)) = c2 F (uxx (x, t))
or the ordinary differential equation
d2 U (ω, t)
+ c2 ω 2 U (ω, t) = 0.
dt2
U (ω, t) = C1 (ω) cos(ωct) + C2 (ω) sin(ωct).
Now,
C1 (ω) = U (ω, 0) = F (ω)
and
ωcC2 (ω) = Ut (ω, 0) = G(ω).
so
G(ω)
C2 (ω) = .
ωc
Hence
1
U (ω, t) = F (ω) cos(ωct) +
G(ω) sin(ωct).
ωc
Applying the inverse Fourier transform, we get
Z∞
1 1
u(x, t) = F (ω) cos(ωct) + G(ω) sin(ωct) eiωx dω.
2π ωc
−∞
5.3.3 Solution Using Laplace Transform

As usual, we use the following notations for the Laplace transform
Lu(x, t) = U (x, s), Ly(x, t) = Y (x, s), Lf (x, t) = F (x, s)
and
L (ut (x, t)) = sU (x, s) − u(x, 0)
L (utt (x, t)) = s2 U (x, s) − su(x, 0) − ut (x, 0)
d
L (ux (x, t)) = (Lu(x, t))
dx
d2 d2 U (x, s)
L (uxx (x, t)) 2 (Lu(x, t)) = .
dx dx2
Now, using the Laplace transform of both sides of the wave equation (5.17)
and the initial conditions (5.19), we have
d2 U
− s2 U = − cos πx
dx2
and the general solution of the above ordinary differential equation is
cos πx
U (x, s) = c1 esx + c2 e−sx + 2 .
s + π2
Using the boundary condition for the function u(x, t) we see that the function
U (x, s) satisfies the conditions
U (0, s) = Lu(0, t) = 0, U (1, s) = Lu(1, t) = 0.
These conditions for U (x, s) imply c1 = c2 = 0 and so
cos πx
U (x, s) = .
s2 + π 2
Therefore

cos πx
u(x, t) = L−∞ (U (x, s)) = L−∞
s2 + π 2

1 1
= cos πxL−∞ = cos πx sin πt.
s2 + π 2 π
5.4 Laplace Equation
∂2φ ∂2φ
+ 2 = 0. (5.29)
∂x2 ∂y
" ∂
#
2 ∂x
The equation is also written as ∇ φ = 0, where ∇ = ∂ . This equation
∂y
has no dependence on time, just on the spatial variables x, y. The Laplace’s
equation describes steady state situations such as:
(i) Temperature distributions
(ii) Stress distributions

(iii) Potential distributions (also called potential equations)
(iv) Flows, for example in a cylinder or around a corner
Stress analysis example: Dirichlet conditions
One steady state stress analysis problem, which satisfies the Laplace equation
is a stretched elastic membrane on a rectangular form that has prescribed
out-of-plane displacements along the boundaries.
To solve:
∂2 ∂2
2
+ 2 = 0.
∂x ∂y
. 1lX
y w = w0 szn -
b +----""---, a
w=O
w=O
a X
w= O
FIGURE 5.1: Laplace Equation with Dirichlet Equation
w(x,y) is the displacement in
z-direction
FIGURE 5.2: Displacement
Boundary conditions are:
w(0, y) = 0, for 0 ≤ y ≤ b.
w(x, 0) = 0, for 0 ≤ x ≤ a.
w(a, y) = 0, for 0 ≤ y ≤ b.
π
w(x, b) = w0 sin a, for 0 ≤ x ≤ a.
x
Solution by separation of variables
Let w(x, y) = X(x)Y (y) be solution of (5.29). Substituting it in (5.29), we get
X 00 Y + XY 00 = 0
X 00 Y 00
+ = 0
X Y
X 00 Y 00
=− = k
X Y
where k is a constant that is either equal to 0, or > 0, or < 0.
Case I: When k = 0, then
X(x) = (Ax + B), Y (y) = (Cy + D)
w(0, y) = 0 ⇒ B = 0 or C = D = 0.
If C = D = 0, then Y (y) ≡ 0, so w(x, y) ≡ 0.
Continuing with B = 0, we have
w(x, y) = Ax(Cy + D).
When w(x, 0) = 0 ⇒ ADx = 0

Either A = 0 (so w ≡ 0) or D = 0.
Continuing with w(x, y) = ACxy,
w(a, y) = 0 ⇒ ACay = 0
A = 0 or C = 0 ⇒ w(x, y) ≡ 0.
That is, the case k = 0 is not possible.

Case II: When k > 0,

Suppose that k = α2 , so that
w(x, y) = (A cosh αx + B sinh αx)(A cos αy + B sin αy).
Recall that cosh 0 = 1, sinh 0 = 0.
w(0, y) = 0 ⇒ A(C cos αy + D sin αy) = 0

C=D=0 ⇒ w(x, y) ≡ 0.
Continue with A = 0 ⇒ w(x, y) = B sinh αx(C cos αy + D sin αy)
w(x, 0) = 0 ⇒ BC sinh αx = 0
B = 0 ⇒ w(x, y) ≡ 0.
Continue with C = 0 ⇒ w(x, y) = BD sinh αx sin αy
w(a, y) = 0 ⇒ BD sinh αa sin αy = 0

so either B = 0 or D = 0 ⇒ w(x, y) ≡ 0.
Again, we find that the case k > 0 is not possible.

Case III: When k < 0,
Suppose that k = −α2 ,
w(x, y) = (A cos αx + B sin αx)(C cosh αy + D sinh αy)
w(0, y) = 0 ⇒ A(C cosh αy + D sinh αy) = 0

C = D = 0 ⇒ w(x, y) ≡ 0.
Continue with A = 0 ⇒ w(x, y) = B sin αx(C cosh αy + D sinh αy)
w(x, 0) = 0 ⇒ BC sin αx = 0
B = 0 ⇒ w(x, y) ≡ 0.
Continue with C = 0 ⇒ w(x, y) = BD sin αx sinh αy
w(a, y) = 0 ⇒ BD sin αa sinh αy = 0

so either B = 0 or D = 0 ⇒ w(x, y) ≡ 0.
π
sin αa = 0 ⇒ α = n
a
π π
⇒ wn (x, y) = BD sin n x sinh n y.
a a
Applying the first three boundary conditions, we have
∞
X nπx nπy
w(x, y) = Kn sin sinh .
n=1
a a
The final boundary condition is:

πx
w(x, b) = w0 sin
a
which gives
∞
πx X nπx nπy
w0 sin = Kn sin sinh .
a n=1
a a
We can see from this that n must take only one value, namely 1, so that
w0
K1 = and the final solution to the stress distribution is
sinh πb
a
w0 πx πy
w(x, y) = sin sinh .
sinh πb
a
a a

Example 235. Consider the following Laplace equation with Dirichlet
boundary value

 uxx + uyy = 0. 0 < x < a. 0 < y < b

u(x, 0) = 0, 0 ≤ x ≤ a


 u(0, y) = u(a, y) = 0, 0 ≤ y ≤ b
u(x, b) = f (x), 0 ≤ x ≤ a.

We solve this problem by separation of variables. Let u(x, y) = X(x)Y (y)

and substitute into Laplace’s equation to obtain
X 00 Y + XY 00 = 0
or
X Y 00
=− = −λ
X Y
X 00 Y 00
in which both and − must be constant because x and y are indepen-
X Y
dent. Then
X 00 + λX = 0
and
Y 00 − λY = 0.
From the boundary conditions
u(x, 0) = X(x)Y (0) = 0,
hence Y (0) = 0 because X(x) cannot be identically zero. Similarly,
X(0) = X(a) = 0.
The problem for X is the well-known eigenvalue problem
X 00 + λX = 0, X(0) = X(a) = 0
with eigenvalues
n2 π 2
λn = , n = 1, 2, . . .
a2
and eigenfunctions
nπx
Xn (x) = sin , n = 1, 2, . . . .
a
For any eigenvalue, the corresponding equation for y is
n2 π 2
Y 00 − Y =0
a2
with general solutions
nπy nπy
Yn = cn e a + dn e− a ,
but Y (0) = 0 so dn = −cn and Yn is of the form

nπy
Yn = 2cn sinh , n = 1, 2, . . .
a
and therefore,
un (x, y) = Xn (x)Yn (y)
nπx nπy
= bn sin sinh , n = 1, 2, . . .
a a
which all satisfy the Laplace equation in the rectangle and the homogeneous
boundary conditions on the lower and vertical sides. To find a solution satis-
fying the condition on the sides y = b, we use superposition principle
∞
X nπx nπy
u(x, y) = bn sin sinh . (5.30)
n=1
a a
Now, we need to choose bn so that

∞ nπx
X nπb
u(x, b) = bn sin sinh = f (x).
n=1
a a
Since this is a Fourier sine expansion of f on [0, b], we have

Za
nπb 2 nπx
bn sinh = f (x) sin dx
a a a
0
and thus
Za
2 nπx
bn = nπb
f (x) sin dx.
a sinh a
a
0
With this choice of bn , Equation (5.30) gives the solution of this Dirichlet
problem for a rectangle.
Example 236. Consider the following Laplace equation with mixed boundary
conditions 
 uxx + uyy = 0. 0 < x < a. 0 < y < b
ux (0, y) = 0, ux (a, y) = 0, 0 < y < a
u(x, 0) = 0, u(x, b) = f (x), 0 < x < a.

With u(x, y) = X(x)Y (y), separation of variables leads to
X Y 00
=− = −λ
X Y
X 00 + λX = 0
and
Y 00 − λY = 0.
From the boundary conditions
ux (0, y) = X 0 (0)Y (y) = 0
and
ux (a, y) = X 0 (a)Y (y) = 0
we have
X 0 (0) = X 0 (a) = 0
because Y (y) cannot be identically zero. Similarly, from
u(x, 0) = X(x)Y (0) = 0
we have
Y (0) = 0.
• Case λ = 0 X 00 = 0. The solution of this ordinary differential equation

(ODE) is X(x) = c1 + c2 x. The boundary value X 0 (0) = 0 then implies
c2 = 0, and so X(x) = c1 .
• Case λ = α2 > 0: ODE becomes
X 00 + α2 X = 0, X 0 (0) = X 0 (a) = 0.
X(x) = c1 cos αx + c2 sin αx.

Using the boundary condition X 0 (0) = 0, with

X 0 (x) = −c1 sin αx + c2 cos αx
we get c2 = 0. The second boundary condition X 0 (a) = 0 yields
−c1 sin αa = 0. Because α > 0, the last equation is satisfied when
nπ
αa = nπ or α = , n = 1, 2, . . . . The eigenvalues of ODE (. . . ) are
a
2 2
n π
λ0 = 0 and λn = , n = 1, 2, . . . . By corresponding λ0 = 0 with
a2
n = 0, the eigenfunctions of ODE are
nπx
X0 (x) = 1 and Xn (x) = cos , n = 1, 2, . . . .
a
Let us now solve ODE subject to the boundary condition Y (0) = 0.
First, λ0 = 0 leads to Y 00 = 0, and thus its solution is Y (y) = c3 + c4 y.
n2 π 2
But Y (0) = 0 implies c3 = 0 so Y (y) = c4 y. Second, for λn = 2 , the
a
ODE becomes
2 2
n π
Y 00 − 2 Y = 0
a
and its general solution is
nπy nπy
Yn (y) = c5 cosh + c6 sinh .
a a
nπy
From this solution Y (0) = 0 implies c5 = 0 so Y = c6 sinh . Thus
a
the products
un (x, y) = Xn (x)Yn (y)
that satisfy the Laplace equation and the homogeneous boundary con-
ditions are
nπy nπx
A0 y and An sinh cos
a a
where
A0 = c1 c4 .
The superposition principle yields
∞
X nπy nπx
u(x, y) = A0 y + An sinh cos .
n=1
a a
Finally, by substituting y = b in the above Equation, we get

∞
X nπb nπx
u(x, b) = f (x) = A0 b + An sinh cos
n=1
a a
which is the cosine Fourier series of f . Therefore, by Chapter 4,

nπb
A0 b = a0 /2 and An sinh = an , n = 1, 2, . . . .
a
It follows that
Za
1
A0 = f (x)dx
ab
0
and
Za
nπb 2 nπx
An sinh = f (x) cos
a a a
0
or
Za
2 nπx
An = f (x) cos .
nπb a
a sinh 0
a

Example 237. Here we discuss solution of the Laplace equation using the
Fourier transform for the problem
 uxx + uyy = 0, x > 0, − ∞ < y < ∞


u(0, y) = f (y), ∞ < y < ∞
 lim u(x, y) = 0, ∞ < y < ∞.
x→∞
Solution: Let
Z∞
U (x, ω) = F(u(x, y)) = u(x, y)e−iωy dy
−∞
and
U (0, ω) = F({) =F (ω).
If we apply the Fourier transform to the Laplace equation, then in view of the
boundary conditions for the function u(x, y) at x = 0, we get the ordinary
d2 U (x, ω)
− ω 2 U (x, ω) = 0, x > 0.
dx2
The general solution of of the above equation is given by
U (x, ω) = c1 (ω)e|ω|x + c2 (ω)e−|ω|x .
By using the condition lim u(x, y) = 0, we obtain c1 (ω) = 0. and from

x→∞
U (0, ω) = F (ω), we have c2 (ω) = F (ω). Thus
U (x, ω) = F (ω)e−|ω|x .
Now, by applying Table 2.2 for inverse Laplace transform, we obtain

1 x
F −∞ (e−|ω|x )(y) = .
π x2 + y 2
Therefore, using the convolution theorem for the Fourier transform, we get
Z∞
1 x 1 x
u(x, y) = f∗ 2 (x) = . f (t) 2 dt
π x + y2 π x2 + (y − t)
−∞
5.4.3 Solution Using Laplace transform

Example 238. Here we solve the Laplace equation, under appropriate bound-
ary conditions, using the Laplace transform.

 uxx + uyy = 0, 0 < x < π, y > 0
u(0, y) = u(π, y) = 0
u(x, 0) = 0, uy (x, 0) = − sin πx.

We define the Laplace transform of u(x, y) with respect to y by

Z∞
L(u(x, y)) = e−sy u(x, y)dy = U (x, s).
0
The transforms of the derivatives are
L(uyy ) = s2 U (x, s) − su(x, 0) − uy (x, 0) (5.31)
and
d2 U
L(uxx ) =. (5.32)
dx2
From (5.31) and (5.32) and the boundary conditions with respect to y, we get
d2 U
+ s2 U = sin πx. (5.33)
dx2
The general solution of the corresponding homogeneous ordinary differential
equation (5.33) is given by
Uc (x, s) = c1 cos sx + c2 sin sx.
The method of undeterminate coefficients gives a particular solution

1
Up (x, s) = sin πx,
s2 − π2
then the general solution of the non-homogeneous ODE is

1
U (x, s) = Uc (x, s) + Up (x, s) = c1 cos sx + c2 sin sx + sin πx.
s2 − π 2
Now, using L (u(0, y)) = U (0, s) = 0 and L (u(π, y)) = U (π, s) = 0, we get
c1 = 0 and c2 can be any real number. For sake of simplicity, we take c2 = 0,
which yields
1
U (x, s) = sin πx
s2 − π 2
and, taking the inverse of both sides, we obtain

1
u(x, y) = L−∞ sin πx
s2 − π2

1 π
= sin πx L−∞
π s2 − π 2
1
= sin πx sinh πy.
π
5.5 Poisson Equation

Poisson equation (non-homogeneous Laplace equation)
The Poisson equation is given by
∂2w ∂2w
+ = f (x, y).
∂x2 ∂y 2
One encounters this equation while studying electrostatic potential in the
presence of charge, gravitational potential in the presence of distributed mat-
ter, equilibrium displacement of a membrane under distributed forces, velocity
potential for an inviscid, incompressible, irrotational homogeneous fluid in the
presence of distributed sources of sinks, and state temperature in the presence
of thermal sources or sinks. For solutions we refer to Adzievski and Siddiqi
[1].
5.6 Simulation of Heat Equation, Wave Equation and

Laplace Equation by MATLAB
Solution of one-dimensional heat equation
The reader is referred to www.Colorado.edu,http://www.colorado.edu/geogr-
aphy/class homepages/geog 4023 s07/labs/html/PDE lab.html#1
Problem : Periodic Heat Diffusion in Subsurface Rocks

dz = 0.25;
% each step depth is 1/4 meter
Nz = 400;
% Choose number of depth steps (at least 100 m)
Nt = 5000;
% Choose number of time steps
dt = (365*24*60*60)/Nt;
% Length of each time step in seconds
(∼ 6.3 ∗ 10∧ 3 seconds, or ∼ 105 minutes)
K = 2*10∧ -6;
% canonical K is 10e − 6m∧ 2/s
T = 15*ones(Nz+1,Nt+1);
% Create temperature matrix with N z + 1 rows, and N t + 1 columns
% Initial guess is that T is 15 everywhere.
time = [0:12/Nt:12];
T(1,:) = 15-10*\sin(2*pi*time/12); Nt = 5000;
dt = (365*24*60*60)/Nt;
K = 2 ∗ 10∧ − 6;
T = 15*ones(Nz+1,Nt+1);
% Create temperature matrix with N z + 1 rows and N t + 1 columns
time = [0:12/Nt:12];
T(1,:) = 15-10*\sin(2*pi*time/12); Nt = 5000;
dt = (365*24*60*60)/Nt;
K = 2*10∧ − 6;
T = 15*ones(Nz+1,Nt+1);
time = [0:12/Nt:12];
T(1,:) = 15-10*sin(2*pi*time/12); Nt = 5000;
dt = (365*24*60*60)/Nt;
K = 2*10∧ − 6;
T = 15*ones(Nz+1,Nt+1);
% Create temperature matrix with N z + 1 rows and N t + 1 columns
time = [0:12/Nt:12];
T(1,:) = 15-10*\sin(2*pi*time/12);
% Set surface temperature
maxiter = 500
for iter = 1:maxiter
Tlast = T; % Save the last guess
T(:,1) = Tlast(:,end);
% Initialize the temp at t = 0 to the last temp
Nt = 5000;
dt = (365*24*60*60)/Nt;
K = 2 ∗ 10∧ − 6;
T = 15*ones(Nz+1,Nt+1);

time = [0:12/Nt:12];
T(1,:) = 15-10*sin(2*pi*time/12); Nt = 5000;
dt = (365*24*60*60)/Nt;
K = 2 ∗ 10∧ − 6;
T = 15*ones(Nz+1,Nt+1);
time = [0:12/Nt:12];
T(1,:) = 15-10*\sin(2*pi*time/12);
maxiter = 500
Tlast = T;
% Save last guess
% Initialize temp at t = 0 to last temp
maxiter = 500
Tlast = T;
% Save last guess
maxiter = 500
Tlast = T;
% Save last guess

maxiter = 500
Tlast = T;
% Save last guess
maxiter = 500
Tlast = T;
% Save last guess
for i=2:Nt+1,
depth_2D = (T(1:end-2,i-1)-2*T(2:end-1,i-1)+T(3:end,i-1))/dz^2;
time_1D = K*depth_2D;
T(2:end-1,i) = time_1D*dt + T(2:end-1,i-1);
T(end,i) = T(end-1,i);
% Enforce bottom BC
end
err(iter) = max(abs(T(:)-Tlast(:)));
% Find difference between last two solutions
if err(iter)<1E-4
break;
% Stop if solutions very similar, we have convergence
end
end
if iter==maxiter;
warning(‘Convergence not reached’)
end
figure(1)
plot(log(err)), title(‘Convergence plot’)
figure(2)
imagesc([0 12],[0 100],T);
title(‘Temperature plot (imagesc)’)
colorbar
figure(3)
depth = [0:dz:Nz*dz];
contourf(time,-depth,T);
title(‘Temperature plot (contourf)’)
colorbar
-2
-4
-6
-8
-10
0 20 60 110 1110 120
FIGURE 5.3: Convergence Plot of Heat Equation
Solution of two-dimensional heat equation in MATLAB:

clear; close all; clc
n =10;
% Grid has n − 2 interior points per dimension (overlapping)
x = linspace(0,1,n); dx = x(2)-x(1); y = x; dy = dx;
TOL = 1e-6;
T = zeros(n);
T(1,1:n) = 10; % TOP
T(n,1:n) = 1; % BOTTOM
T(1:n,1) = 1; % LEFT
T(1:n,n) = 1; % RIGHT
dt = dx∧ 2/4;
error = 1; k = 0;
while error > TOL
0
10
2D
30
40
50
60
111
10
90
100
0 2 4 6 II 10 12
FIGURE 5.4: Temperature Plot (imagesc)

0
-10 ---s --........::::: ---s --=::::=:::::::

-20
-311 18
..co 16
-50
14
-QI
12
-70
-811
-911
-100
0 2 6 8 10 12
FIGURE 5.5: Temperature Plot (contourf)
k = k+1;
Told = T;
for i = 2:n-1
for j = 2:n-1
T(i,j) = dt*((Told(i+1,j)-2*Told(i,j)+Told(i-1,j))/dx∧ 2 . . .
+ (Told(i,j+1)-2*Told(i,j)+Told(i,j-1))/dy∧ 2) . . .
+ Told(i,j); end
end
error = max(max(abs(Told-T)));
end
subplot(2,1,1), contour(x,y,T)
title(‘Temperature (Steady State)’), xlabel(‘x’),
ylabel(‘y’), colorbar
subplot(2,1,2), pcolor(x,y,T), shading interp,
title(‘Temperature (Steady State)’), xlabel(‘x’),
ylabel(‘y’), colorbar
clear; close all; clc;

c = 1;
n = 21;
x = linspace(0,1,n);
dx = 1/(n-1);
dt = dx;
u(:,1) = sin(pi*x);
u(1,2) = 0;
for i = 2:n-1
u(i,2) = 0.5*(dt^2*c^2*(u(i+1,1)-2*u(i,1)+u(i-1,1))/dx^2
+2*u(i,1));
end
u(n,2) = 0;
error = 1; k = 1;
while k < 100
k = k+1;
u(1,k+1) = 0;
for i = 2:n-1
u(i,k+1) = dt^2*c^2*(u(i+1,k)-2*u(i,k)+u(i-1,k))/dx^2
+2*u(i,k)-u(i,k-1);
end
u(n,k+1) = 0;
end
plot(x,u), xlabel(‘x’),ylabel(‘y’)
Wave equation in one dimension
1.-----~----~----~~----~----~
o..4 0.6
X
FIGURE 5.6: Temperature (Steady State)

for c = [1 -1];
cc = 0;
n = 261;
u = zeros(n,1);
u(121:141) = sin(pi*x(121:141));
dx = x(2)-x(1)
dt = dx;
error = 1;
TOL = 1e-6;
k = 0;
while k < 110

uold = u;
k = k+1;
for i = 2:n-1
if c == 1, u(i) = dt*c*(uold(i+1)-uold(i))/dx+uold(i);
end \%c = 1
if c == -1, u(i) = dt*c*(uold(i)-uold(i-1))/dx+uold(i);
end \%c = -1
end
error = max(abs(u-uold));
if mod(k,10)==0, cc = cc+1; out(cc,:) = u;
end
end
if c == 1
subplot(2,1,1),
for hh = 1:cc
plot(x,out(hh,:)+hh),
hold on,
end
u = zeros(n,1);
u(121:141) = sin(pi*x(121:141)); plot(x,u)
xlabel(‘u(x)’), ylabel(‘Time’),
title(‘Translation to the Left’)
elseif c == -1
subplot(2,1,2)
for hh = 1:cc
plot(x,out(hh,:)+hh),hold on,
end
u = zeros(n,1);
u(121:141) = \sin(pi*x(121:141)); plot(x,u)
xlabel(‘u(x)’), ylabel(‘Time’),
title(‘Translation to the Right’)
end
end
Wave equation in two dimensions

A~Ft>>,,,,,, ....
0 2 " 6 8 10 12 14
u(x)
~ ~I • • ~,<<'',,@1 1
>
0 2 " 6 8 10 12 14
u(x)
(b)
FIGURE 5.7: (a) Translation to Left (b) Translation to Right

c = 1;
n = 21;
dx = 1/(n-1);
dt = dx;
u(:,1) = sin(pi*x);
u(1,2) = 0;
for i = 2:n-1
u(i,2) = 0.5*(dt^2*c^2*(u(i+1,1)-2*u(i,1)+u(i-1,1))/dx^2
+2*u(i,1));
end
u(n,2) = 0;
error = 1; k = 1;
while k < 100
k = k+1;
u(1,k+1) = 0;
for i = 2:n-1
u(i,k+1) = dt^2*c^2*(u(i+1,k)-2*u(i,k)+u(i-1,k))/dx^2
+2*u(i,k)-u(i,k-1);
end
u(n,k+1) = 0;
end
plot(x,u), xlabel(‘x’),ylabel(‘y’)
-IL6
-no
...
-·.~._.~._.----oD.3~._.~= ._'7-.---='._,=--='=._.-._"="-._j1
X
FIGURE 5.8: Solution of Laplace Equation in one dimension
Laplace equation in one dimensional with matlab dirichlet bound-

ary condition
(References: https://math.stackexchange.com/questions/692532/laplace-equ-
ation-in-1d-with-matlab-dirichlet-boundary-condition)
MATLAB Program
% Solve equation −u00 (x) = f (x) with Dirichlet boundary condition
clear all
close all
N=2;
for j=1:10
a=0;
b=14;
N=2*N;
M(j)=N;
delta_x=(b-a)/N;
k=2;
A=sparse(N-1,N-1);
for i=1:N-1
if (i==1)
A(i,i)=2;
A(i,i+1)=-1;
elseif(i==N-1)
A(i,i-1)=-1;
A(i,i)=2;
else
A(i,i-1)=-1;
A(i,i+1)=-1;
A(i,i)=2;
end
end
A=A/((delta_x)^2);
for i=1:N-1
b(i)=functionf(i*delta_x,k)
end
u=A^(-1)*b’;
for i=1:N+1
u_ex(i)=exact_solution((i-1)*delta_x,k);
end
for i=1:N+1
if (i==1)
u_dis(i)=0;
elseif(i==N+1)
u_dis(i)=0;
else
u_dis(i)=u(i-1,1);
end
end
for i=1:N+1
t(i)=(i-1)*delta_x;
end
norm_maxl2(j)=0;
for i=1:N+1
if (abs(u_dis(i)-u_ex(i)) > norm_maxl2(j))
norm_maxl2(j)=abs(u_dis(i)-u_ex(i));
end
end
norm_maxl2(j);
norm_l2(j)=0;
for i=1:N+1
norm_l2(j)=norm_l2(j)+(u_dis(i)-u_ex(i))^2*delta_x;
end
norm_l2(j)=(norm_l2(j))^1/2);
norm_maxh1(j)=0;
for i=1:N
if (abs(((u_dis(i+1)-u_ex(i+1))
-(u_dis(i)-u_ex(i)))/delta_x) > norm_maxh1(j))
norm_maxh1(j)=abs(((u_dis(i+1)-u_ex(i+1))
-(u_dis(i)-u_ex(i)))/delta_x);
end
end
norm_maxh1(j);
norm_h1(j)=0;
for i=1:N
norm_h1(j)=norm_h1(j)+(((u_dis(i+1)-u_ex(i+1))
-(u_dis(i)-u_ex(i)))/delta_x)^2*delta_x;
end
norm_h1(j)=(norm_h1(j))^(1/2);
figure
hold on
plot(t,u_ex,’blue’, t,u_dis,‘red’)
ylim([-0.02 0.12]);
hold off
end
plot(log(M), -log(norm_maxl2),‘blue’,log(M), -log(norm_l2), ‘red’,
log(M), -log(norm_maxh1), ‘cyan’, log(M), -\log(norm_h1),
‘magenta’, log(M), 2*log(M),‘black’)
function f=functionf(x,k);
if (k==1)
f=2;
end
if (k==2)
f=-6*x+12*x^2;
end
function u_ex=exact_solution(x,k);
if (k==1)
u_ex=x*(1-x);
end
if(k==2)
u_ex=x^3*(1-x);
end
FIGURE 5.9: Solution of Laplace Equation in Two dimensions
FIGURE 5.10: Solution of Laplace Equation in 1D
Laplace equation in two dimensions

There are two parallel semi-infinite metal plates in the XZ plane. One plate is
located at y = −1m and extends from x = 0m to x = 4m and the other plate
is located at y = +1m and extends from x = 0m to x = 4m. At x = 0m, an
••
....
FIGURE 5.11: Solution
0.1
D.JlB
D.Jlll
OJ)t
D.D2
.nmo 0.1 D.2 03 0.4 o.s D.B 0.7 D.J! D.ll
0.1

insulated strip connects the two semi-infinite plates and is held at a constant
potential 100V . Find the potential and the electric field in the region bounded
by the plates.
Figures 5.17 plot the potential and show that the potential V has no local
maximum or minima; all extremes are on the boundaries. The solution of the
Laplace equation is the most featureless function possible, consistent with the
boundary conditions: no hills, no valleys, only the smoothest surface.
~1
/
~
~
0 ~1 Q2 Q3 Q5 1
~·
~6 ~7 ~ ~
5.7 Solving Partial Differential Equation by MATHE-

MATICA
FIGURE 5.18: Simulation of Heat Equation by Mathematica
FIGURE 5.19: Solution of Heat Equation by Mathematica
100
80
100
60
> 50
> 40
0
0
20
0
X [m] 4 -1 y [m)
FIGURE 5.20: Solution of Heat Equation by Mathematica
Example 239. Find the general solution of the partial differential equation
2zx (x, t) + 5zt (x, t) = z(x, t) − 1.

potential
80
0.5
60
.;, 0
>.
-0.5 20
0
2 3 4
x [m]
FIGURE 5.21: Potential V in region between plates
In[ ]:=DSolve[2D[z[x, t], x] + 5D[z[x, t], t] ==

z[x, t] + 1, z, \{x, t\}].
Out[ ]:={{z ->({x, t} - > e^{1/2}c_1(1/2(2t-5x))-1)\}\}.
Example 240. Find the general solution of the second order partial differ-
ential equation
3xxx (x, t) − 2ztt (x, t) = 1.
In[ ]:=DSolve[3D[z[x, t], {x, 2}] - 2D[z[x, t], y, 2] == 1,
z, {x, t}].
Out[ ]:= {{z -> ({x,t} -> (e^{x/2}}c_1((1/2)(2t-5x))-1))}}.
In[ ]:={{z -> ({x,t} -> x c_1(t-sqrt{2/3}x)

+c_2(t+\sqrt{2/3}x)+(x^2/6)\}\}.

Example 241. Plot the solution of the initial value problem (partial differ-
ential equation)
ut (x, t) = 9uxx (x, t), 0 < x < 5, 0 < t < 10

u(x, 0) = 0, 0 < x < 5
u(0, t) = sin 2t, u(5, t) = 0 0 < t < 10.
In[ ]:= NDSolve[{D[u[x,t],t]==9D[u[x,t],x,x],u[x,0]==x,

u[0,t]==sin[2t],u[5,t]==0},u,{t,0,10},{x,0,5}].
Out[ ]:={{u -> InterpolatingFunction [{{0,5},{0,10} }<>]}}.
We plot the solution with
FIGURE 5.22: Simluation of Example 241
In[ ]:= Plot3D[Evaluate[u[x,t].%],{t,0,20},{x,o},

PlotRange -> All].
5.8 Practical Applications in Physics and Mechanics

Damping
Air resistance and non-elastic effects in a string will contribute to reduce
the amplitudes of the waves so that the motion dies out after some time. This
damping effect can be modeled by a term but on left hand side of the equation
∂2u ∂u ∂2u
% + b = T .
∂t2 ∂t ∂x2
The parameter b is normally determined from physical experiments.
External Forcing
It is easy to include an external force acting on the string. Say we have a
vertical force f˜ij acting on mass mi . This force affects the vertical component
of Newton’s law and gives rise to an extra term f˜(x, t) on the right-hand side
of (5.1). In this model we would add a term f˜(x, t).
Waves on membrane
Elastic waves in rod
Consider an elastic rod subject to a hammer impact at the end. This exper-
iment will give rise to an elastic deformation pulse that travels through the
rod. A mathematical model for longitudinal waves along an elastic rod starts
with the general equation for deformations and stresses in an elastic medium,
%utt = ∇.δ + %f, (5.34)
where % is the density, u the displacement field, the stress tensor, and f body
forces. The latter has normally no impact on elastic waves. For stationary
deformation of an elastic rod, σxx = Eux , with all other stress components
being zero. Moreover, u = u(x)i. The parameter is known as Youngs modulus.
Assuming that this simple stress and deformation field, which is exact in
the stationary case, is a good approximation in the transient case with wave
motion. (5.34) simplifies to
∂2u

∂ ∂u
ρ 2 = E .
∂t ∂x ∂x
The associated boundary conditions are u or σxx = Eux , typically u = 0 for

a clamped end and σx x = 0 for a free end.
Acoustic model for seismic waves
Seismic waves are used to infer properties of subsurface geological structures.
The physical model is a heterogeneous elastic medium where sound is propa-
gated by small elastic vibrations. The general mathematical model for defor-
mations in an elastic medium is based on Newton’s second law,
ρutt = ∇.σ + f, ρf (5.35)
and a constitutive law relating to u, often Hooke’s generalized law,

2
σ = K∇.uI + G(∇u + (∇u)T − ∇.uI). (5.36)
3
Here is the displacement field, is the stress tensor, I is the identity tensor, % is
the medium’s density, f are body forces (such as gravity), K is the medium’s
bulk modulus and G is the shear modulus. All these quantities may vary in
space, while u and will also show significant variations in time during wave
motion.
The acoustic approximation to elastic waves arises from a basic assumption
that the second term in Hookes law, representing the deformations that give
rise to shear stresses, can be neglected. This assumption can be interpreted
as approximating the geological medium by a fluid. Neglecting also the body
forces f , (5.35) becomes
ρutt = ∇(K∇ · u). (5.37)
Introducing p as a pressure via
p = K∇ · u, (5.38)
and dividing (5.37) by %, we get

1
utt = − ∇p.
%
Taking the divergence of this equation, using ∇ · u = p/K from (5.38), gives
the acoustic approximation to elastic waves:

1
ptt = K∇ · ∇p . (5.39)
%
This is a standard linear wave equation with variable coefficients. It is common

to add a source term s(x, y, z, t) to model the generation of sound waves:

1
ptt = K∇ · ∇p + s. (5.40)
%
A common additional approximation is based on using the chain rule on the

right hand side,

1 K 1 K
K∇ · ∇p = ∇2 p + K∇ · · ∇p ≈ ∇2 p (5.41)
% % % %
under the assumption that the relative spatial gradient ∇%1 = %2 ∇% is small.
This approximation results in the simplified equation

K
ptt = ∇2 p + s. (5.42)
%
The acoustic approximations to seismic waves are used for sound waves in
the ground, and the Earth’s surface is then a boundary where p equals the
atmospheric pressure po such that the boundary condition becomes p = po .
Anisotropy p
Quite often in geological materials, the effective wave velocity c = K/% is
different in different spatial directions because geological layers are compacted
such that the properties in the horizontal and vertical directions differ. With
z as the vertical coordinate, we can introduce a vertical wave velocity cz and
a horizontal wave velocity ch , and generalize (G) to
ptt = c2z pz z + Ch2 (pxx + pyy ) + s. (5.43)
Sound waves in liquids and gases

Sound waves arise from pressure and density variations in fluids. The starting
point of modeling sound waves involves the basic equations for a compress-
ible fluid where we omit viscous (frictional) forces, body forces (gravity, for
instance), and temperature effects:
p+ + ∇ · (pu) = 0 (5.44)
put + pu.∇u = ∇p (5.45)
p = p(p). (5.46)
These equations are often referred to as the Euler equations for the motion
of a fluid. The parameters involved are the density %, the velocity u, and the
pressure p. Equation (5.45) reflects mass balance, (K) is Newton’s second law
for a fluid, with frictional and body forces omitted, and (M ) is a constitutive
law relating density to pressure by thermodynamics considerations. A typical
model for (M ) is the so-called isentropic relation, valid for adiabatic processes
where there is no heat transfer:
1/r
P
% = po
p0
p = po p 1/r .
( po )
po and p0 are reference values for p and % when the fluid is at rest, and γ is the
ratio of specific heat at constant pressure and constant volume (γ =√5/3 for
air). It is modeled by a standard linear wave ptt = c2 ∇2 p where C = rpo /po
is the speed of sound in the fluid.
5.9 Exercises
Separation of variables method for heat equation
In Exercises 5.1 through 5.7 use the separation of variables method to solve
the heat equation
ut = a2 uxx , 0 < x < l, t > 0.
√
5.1. a = 3, l = π, u(0, t) = u(π, t) = 0, u(x, 0) = x(π − x)
5.2. a = 1, l = 3, ux (0, t) = ux (3, t) = 0, u(x, 0) = x

√
5.3. a = 2, l = 4, u(0, t) = ux (4, t) = 0, u(x, 0) = x + 1
√
5.4. a = 2, l = π, u(0, t) = u(π, t) = 0, u(x, 0) = 20
√

20, 0 ≤ x < 1
5.5. a = 2, l = 2, u(0, t) = u(2, t) = 0, and u(x, 0) =
0, 1 ≤ x < 2
Laplace transform method for heat equation

Use the Laplace transform method to solve the heat equation
ut = a2 uxx , 0 < x < l, t > 0
subject to:
5.6. ut = a2 uxx , 0 < x < 1, t > 0, u(x, 0) = 0, u(0, t) = 1, u(1, t) = u0 , u0 is

a constant.
5.7. ut = a2 uxx , 0 < x < ∞, t > 0, u0 and u1 are constants.
Fourier transform method for heat equation

Use the Fourier transform method to solve the indicated initial boundary value
problems on the indicated intervals, subject to the given conditions:
5.8. ut = a2 uxx , −∞ < x < ∞, t > 0, u(x, 0) = µ(x).

1, |x| < 1
5.9. ut = a2 uxx , −∞ < x < ∞, t > 0, u(x, 0) = .
0, |x| > 1
5.10. ut = a2 uxx , u(x, 0) = e−|x| .
5.11. ut = a2 uxx + f (x), −∞ < x < ∞, t > 0.
Use the Fourier transform method to solve the indicated heat equation, subject
to the given initial and boundary value conditions.
ut = a2 uxx , 0 < x < ∞, t > 0.
5.12. ux (0, t) = µ(t), and u(x, 0) = 0

1, 0 < x < 1
5.13. u(x, 0) =
0, ≤ 1 ≤ x < ∞
5.14. u(x, 0) = 0, ut (x, 0) = 1, lim u(x, t) = 0

x→∞
5.15. ut = a2 uxx + f (x),0 < x < ∞, t > 0

u(x, 0) = 0, 0 < x < ∞, u(0, t) = 0
Separation of variables for wave equation

Using the method of separation of variables solve the wave equation
utt (x, t) = uxx (x, t), 0 < x < π, t > 0,
subject to the Dirichlet boundary conditions
u(0, t) = u(π, t) = 0, t > 0,
and the following initial conditions.

5.16. u(x, 0) = 0, ut (x, 0) = 1, 0 < x < π.
5.17. u(x, 0) = πx − x2 , ut (x, 0) = 0, 0 < x < π.


 3 x, 0 < x < 2 π

5.18. u(x, 0) = 2 3 , ut (x, 0) = 0, 0 < x < π.
2
 3(π − x), π < x < π

3
3 π


 x, 0<x<
 2 4


π 3π
5.19. u(x, 0) = sin x, ut (x, 0) = 1, <x< .
 4 4


 3(π − x), 3
π<x<π

4

 x, π
0<x<
5.20. u(x, 0) = π 2 , ut (x, 0) = 0, 0 < x < π.
 π − x, <x<π
2
Fourier transform for wave equation
5.21. Use the Fourier transform to solve the following wave equations subject
to the initial and boundary conditions.
utt (x, t) = c2 uxx (x, t), − ∞ < x < ∞, t > 0.

1
u(x, 0) = , ut (x, 0) = 0.
1 + x2
5.22. utt (x, t) = c2 uxx (x, t), 0 < x < ∞, t > 0

1
u(x, 0) = 1+x 2 , ut (x, 0) = 0
ux (0, t) = f (t), 0 < t < ∞

(Hint: Use the Fourier cosine transform)
5.23. utt (x, t) = c2 uxx (x, t) + f (x, t), 0 < x < ∞, t > 0
u(x, 0) = 0, ut (x, 0) = 0
ux (0, t) = f (t), 0 < t < ∞
(Hint: Use the Fourier sine transform)
Laplace transform for wave equation

Use the Laplace transform method to solve the indicated initial boundary
value problems on the interval (0, ∞), subject to the given conditions.
utt (x, t) = c2 uxx (x, t), x > 0, t > 0.
5.24. u(0, t) = f (t), lim u(x, t) = 0, t > 0, u(x, 0) = 0, ut (x, 0) = 0, x > 0.

x→∞
5.25. u(0, t) = 0, lim u(x, t) = 0, t > 0, u(x, 0) = xe−x , ut (x, 0) = 0, x > 0.

x→∞
5.26. u(0, t) = sin t, u(0, t) = 0, t > 0, u(x, 0) = 0, ut (x, 0) = 0, x > 0.
Use the Laplace transform to solve the initial boundary value problem for the
wave equation on the interval (0, 1), subject to the given conditions.
5.27. u(x, 0) = sin πx, ut (x, 0) = − sin πx, u(0, t) = u(1, t) = 0.
Separation of variables method for Laplace equation

Using the separation of variables method solve the following Laplace problem
with the given data.
uxx + uyy = 0, 0 < x < a, 0 < y < b

u(x, 0) = f1 (x), u(x, b) = f2 (x), 0 < x < a
u(0, y) = g1 (y), u(a, y) = g2 (y), 0 < y < b.
5.28. a = b = 1, f1 (x) = 100, f2 (x) = 200, g1 (y) = g2 (y) = 0.
5.29. a = b = π, f1 (x) = 0, f2 (x) = 1, g1 (y) = g2 (y) = 1.
5.30. a = 1, b = 2, f1 (x) = 0, f2 (x) = x, g1 (y) = g2 (y) = 0.
5.31. a = 2, b = 1, f1 (x) = 100, f2 (x) = 0, g1 (y) = 0, g2 (y) = 100y(1 − y).
5.32. a = 2, b = 1, f1 (x) = 100, f2 (x) = 0, g1 (y) = 0, g2 (y) = 100y(1 − y).

5.33. a = b = 1, f1 (x) = 7 sin(7πx), f2 y(x) = sin(πx), g1 (y) = sin 3πy,

g2 (y) = 6πy.
5.34. a = b = 1, f1 (x) = 7 sin(7πx), f2 (x) = sin(πx), g1 (y) = sin 3πy, g2 (y) =

6πy.
In the following problems, solve the Laplace equation in [0, a] × [0, b] subject
to the given boundary conditions.
5.35. u(x, 0) = f (x), u(x, b) = 0, 0 < x < a; u(0, y) = 0, u(a, y) = 0, 0 < y <
b.
5.36. uy (x, 0) = 0, uy (x, 1) = 0, 0 < x < a; u(0, y) = 0, u(1, y) = 1 − y,

0 < y < b.
5.37. ux (0, y) = u(0, y), u(0, y) = 1, 0 < y < b; u(x, 0) = 0, u(x, π) = 0,

0 < x < a.
5.38. ux (0, y) = u(0, y), ux (0, y) = 0, 0 < y < b; u(x, 0) = 0, u(x, b) = f (x),
0 < x < a.
5.39. ux (0, y) = 0, ux (a, y) = 1, 0 < y < a; u(x, 0) = 1, u(x, b) = 1, 0 < x < a.
5.40. u(x, 0) = 1, u(x, 1) = 4, 0 < x < π; u(0, y) = 0, u(π, y) = 0, 0 < y < 1.
Fourier transform for Laplace equation

Use one of the Fourier transforms to solve the Laplace equation
uxx + uyy = 0, (x, y) ∈ D
on the indicated D, subject to the given boundary value conditions.
5.41. D = {(x, y) : 0 < x < π, 0 < y < ∞}; u(0, y) = 0, u(π, y) = e−y for
y > 0 and uy (x, 0) = 0 for 0 < x < π.
5.42. D = {(x, y) : 0 < x < ∞, 0 < y < 2}; u(0, y) = 0, for 0 < y < 2,
u(x, 0) = f (x), u(x, 2) = 0 for 0 < x < π.
5.43. D = {(x, y) : 0 < x < ∞, 0 < y < ∞}; u(0, y) = e−y , for u(π, y) = e−y
for 0 < y < ∞ and u(x, 0) = e−x for 0 < x < ∞.

1, |x| < 1
5.44. D = {(x, y) : − ∞ < x < ∞, 0 < y < ∞}; u(x, 0) = .
0, |x| > 1
1
5.45. D = {(x, y) : − ∞ < x < ∞, 0 < y < ∞}; u(x, 0) = .
4 + x2
5.46. D = {(x, y) : − ∞ < x < ∞, 0 < y < ∞}; u(x, 0) = cos x.

Applications of MATHEMATICA to well known partial differential equa-
tions are explained well in Adzievski and Siddiqi [1]. As we saw in this chapter,
partial differential equations constitute a major field of study in contempo-
rary mathematics and also in other fields such as differential geometry and
probability theory. Partial differential equations are also used widely in the-
oretical and applied aspects of physics, mechanics, engineering, chemistry,
bioscience, medicine, meteorology, climatology, and economics. As discussed
earlier, several numerical and analytical methods have been devised to solve
various types of partial differential equations. A recent example is the finite
volume method. Interested readers can obtain more information in Chapter
10 of a book by Le Dret and Lucquin [8] that provides interesting accounts of
partial differential equation modeling, analysis and numerical approximation.
References [2] through [7] are useful adjuncts for further study of the contents
of this chapter.
Bibliography
[1] K. Adzievski, A. H. Siddiqi, Introduction to Partial Differential Equations

for Scientists and Engineers Using Mathematica, CRC Press, 2014.
[2] N. Asmar, Partial Differential Equations with Fourier Series and Bound-
ary Value Problems, Second Edition, Pearson, 1992.
[3] W. E. Boyce, R. C. Diprima, Elementary Differential Equations and
Boundary Value Problems, Pearson, 2010.
[4] M. P. Coleman, An Introduction to Partial Differential Equations with

MATLAB, Applied Mathematics and Nonlinear Science Series, Chapman
& Hall, New York, 2005.
[5] G. H. Edwards, D. E. Penney, Differential Equations, Computing and
Modeling, Pearson, 2008.
[6] L. C. Evans, Partial Differential Equations: Graduate Studies in Mathe-

matics, American Mathematical Society, Providence, 2010.
[7] G. B. Folland, Fourier Series and its Applications, Wadsworth & Brooks,
Pacific Grove CA, 1992.
[8] H. Le Dret, B. Lucquin, Partial Differential Equations: Modeling, Anal-

ysis and Numerical Approximation, Birkhäuser, 2016.
443
Chapter 6
Algorithmic Optimization
6.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446

6.1.1 Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
6.2 Analysis of Quadratic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
6.2.1 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
6.2.2 Line Search Descent Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 451
6.2.3 The Method of Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . . 451
6.2.4 Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
6.3 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
6.4 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
6.4.1 Tableau Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
6.5 Complementarity Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
6.5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
6.6 Variational Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
6.6.1 Variational Inequality Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 463
6.6.2 Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
6.6.3 Optimization and Variational Inequalities . . . . . . . . . . . . . . 463
6.7 Queuing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
6.7.1 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
6.7.2 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
6.8 Iterative Methods and Pre-conditioning . . . . . . . . . . . . . . . . . . . . . . . . . 471
6.8.1 Norms of Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 471
6.8.2 General Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
6.8.3 Jacobi Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
6.8.4 The Gauss-Seidel Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
6.9 Krylov Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
6.9.1 Arnoldi’s Orthogonalization Method . . . . . . . . . . . . . . . . . . . . 478
6.9.2 Arnoldi’s Method for Linear Systems . . . . . . . . . . . . . . . . . . . 478
6.9.3 Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
6.9.4 Preconditioned Conjugate Gradient Method . . . . . . . . . . . . 484
6.9.5 Generalized Minimum Residual Method (GMRES) . . . . 486
6.10 Multi-grid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
6.10.1 Multi-grid Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
6.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
445
6.1 Basic Concepts

An optimization problem is characterized by its specific objective function
that is to be maximized or minimized, depending upon the problem and, in
the case of a constrained problem, a given set of constraints. Possible objec-
tive functions include expressions representing profits, costs, market share,
portfolio risk and other calculations. Possible constraints include those that
represent limited budgets or resources, non-negativity constraints on variables
and conservation equations.
6.1.1 Gradient Method

In this section we will study the problem of minimizing a function of n
variables. If f is a function from Rn to R, the gradient of f at a point x is
the vector G = (G1 , ..., Gn )T whose components are
∂f
Gi = Gi (x) = , (1 ≤ i ≤ n). (6.1)
∂xi
The gradient is also denoted by ∇G or simply by f 0 (x).

The Hessian of f at x is the matrix H whose entries are
∂2f
Hij = Hij (x) = . (6.2)
∂xi xj
With the gradient ad Hessian in hand, we can write the linear and quadratic
terms in the Taylor expansion of f
1
f (x + h) = f (x) + G(x)T h + hT H(x)h + . . . . (6.3)
2
The gradient vector points in the direction of steepest ascent or (increase),
and its negative points in the direction of steepest decrease.This important
fact is proved by taking any unit vector u, and considering how the function
changes locally at x in the direction of u. This can be seen by computing the
directional derivative
d
f (x + tu) |t=0 . (6.4)
dt
By using the Taylor formula (6.3), with h replaced by tu, we obtain
1
f (x + h) = f (x) + tG(x)T u + t2 uT H(x)u + . . . (6.5)
2
whence
d
f (x + tu) = G(x)T u + tuT H(x)u + . . . . (6.6)
dt
Algorithmic Optimization 447
Using t = 0, we obtain
d
f (x + tu) |t=0 = G(x)T u (6.7)
dt
as the rate of change of f at x in the direction of u. By the Cauchy-Schwarz
inequality,
kG(x)uk ≤ kG(x)k kuk (6.8)

≤ kG(x)k .
That is, this rate of change cannot exceed kG(x)k. On the other hand, we can
attain this upper bound by taking u to be the unit vector in the direction of
G(x).
An obvious strategy for locating a minimum point of f is to start at
any point x and determine the direction in which f decreases at the fastest
rate, that is, in the direction of −G(x). A line search can be carried out on
{x − tG(x) : t > 0}; we start over by computing a new direction of steepest
descent, and so on.
6.2 Analysis of Quadratic Functions

The problem consists of finding a vector x∗ (called local minimizer) such
that
f (x∗ ) = minf (x) for x in F
x
n
where F is a subset of R .
If f is continuously differentiable in the vicinity of x∗ , a necessary and
sufficient condition for x∗ to be a strong local minimum are:
G(x∗ )T u = 0 for all u 6= 0
and
H(x∗ ) positive-definite.
A common strategy in minimization problem is to assume that our function
is locally approximately a quadratic polynomial. A general quadratic function
of n variables is of the form
1 T
f (x) = x Ax − bT x + a. (6.9)
2
Here, a is scalar and b is a constant vector having n components. The matrix
A is n × n, symmetric, and constant (constant entries).The right side of the
equation displays (from left to right) the term of exact degree 2, the terms of
exact degree 1, and the term of degree 0. In the case of two variables, the first
terms are the three types x21 , x22 , and x1 x2 .
Because quadratic functions should be good local models for any given
function that is twice differentiable, we will derive some basic facts about
them. First, the full form of f is
n n n
1 XX X
f (x) = xi xj Aij − bi xi + a (6.10)
2 i=1 j=1 i=1
Next, we compute the gradient of f , which is by definition the vector having

components
n n
∂f 1X 1X
= xj Akj + xi Aik − bk (6.11)
∂xk 2 j=1 2 i=1
Because the matrix A is symmetric, this equation becomes

n
∂f X
= Akj xj − bk (6.12)
∂xk j=1
Thus, the vector expression for the gradient is
G(x) = ∇f (x) = Ax − b (6.13)
To find the critical point of f , that is, the point where the derivative of f
vanishes, we set G(x) = 0 and note that this occurs when Ax = b, or in other
words, when
x = A−1 b (6.14)
Example 242. Write the quadratic function
f (x) = 4x21 − 2x1 x2 + 3x22 + 3x1 − 2x2 + 1
in the standard vector-matrix notation, and find the minimizer and the min-
imum value for the function.
Solution:
− 32

4 −1 x1
f (x) = (x1 , x2 ) − 2 (x1 , x2 ) +1
−1 3 x2 1
whereby
− 32

4 −1
A= ,b=
−1 3 1
From (6.13) and (6.14), the minimizer can be found by solving the linear
system
Ax = b
or
− 32

4 −1 x1
= . (6.15)
−1 3 x2 1
Applying the usual Gauss Elimination algorithm, only one row operation is
required to replace the coefficient matrix in upper triangular form. This gives
T
∗ ∗ ∗T
7 5 T
x = x1 , x2 = − , = (−0.31818, .22727) . The minimal value for
22 22

7 5 13
f is f (x∗ ) = f − , 22 = ≈ .29546.
22 44
It is instructive to compare the algebraic solution with the minimization
procedure learned in multi-variable calculus. The critical points of f (x1 , x2 )
are found by setting both partial derivatives equal to zero
∂f
= 8x1 − 2x2 + 3 = 0
∂x1
∂f
= −2x1 + 6x2 − 2 = 0
∂x2
or
4x1 − x2 = − 23
−x1 + 3x2 = 1
This is precisely the same linear system we already constructed in (7.63).
To check whether x∗ is a (local) minimum, we need to analyze the Hessian
matrix, which is the symmetric matrix of second order derivatives evaluated
5 T
at x∗1 , x∗T 7

2 = − 22 , 22
 ∂2f ∂2f


∂x21 ∂x2 ∂x1 8 −2
H= = = 2A
∂2f ∂2f −2 6
∂x1 ∂x2 ∂x22
If the Hessian matrix is positive definite-which we already in this case-then

the critical point is indeed a (local) minimum.
Example 243. Minimize the quadratic function
f (x) = x21 + 2x1 x2 + x1 x3 + 2x22 + x2 x3 + 2x23 + 6x2 − 7x3 + 5
Solution: This has the matrix form with
1 1 12
     
x1 0
A =  1 2 12  , x =  x2  , b =  −3  , c = 5
1 1 7
2 2 2 x3 2
Gaussian Elimination produces the LDLT factorization

1 1 12 1
     
1 0 0 1 0 0 1 1 2
A =  1 2 12  =  1 1 0   0 1 0   0 1 0 
1
1
2
1
2 2 2 0 1 0 0 74 0 0 1
The pivots, i.e., the diagonal entries of D, are all positive, and hence A is
positive definite. Therefore, f (x) has a unique minimizer x∗ = (x∗1 , x∗2 , x∗3 ),
which is found by solving the linear system Ax = b. The solution is then
quickly obtained by forward and backward substitution:
x∗1 = 2, x∗2 = −3, x∗3 = 2
with
f (x∗ ) = f (2, −3, 2) = −11
6.2.1 Newton’s Method

The general indirect method for determining x∗ is to solve the system of
equations G(x) = 0 by some numerical method to yield all stationary points.
A popular method to achieve this is Newton’s method. Since in general
the system will be nonlinear, multiple stationary points are possible.These
stationary points must then be further analyzed in order to determine whether
or not they are local minima.
Assume x∗ is a local minimum and xk an approximate solution, with as-
sociated unknown error δ such that x∗ = xk + δ. Then by applying Taylor’s
theorem and the first necessary condition for a minimum at x∗ it follows that
2
0 = G(x∗ ) = G(xk + δ) = G(xk ) + H(xk )δ + O kδk
If xk is a good approximation then the solution of the linear system
H(xk )δ + G(xk ) = 0
, obtained by ignoring the second order in δ above. A better approximation is

therefore expected to be xk+1 = xk + δ which leads to the Newton iterative
scheme:Given an initial approximation x0 , compute
xk+1 = xk − H −1 (xk )G(xk ), k = 1, 2, ... (6.16)
Newton’s method applied to a quadratic problem

Consider the unconstrained problem
1 T
Minimize f (x) = x Ax − bT x + a (6.17)
2
In this case the first iteration in (6.16) yields
x1 = x0 − A−1 (Ax0 − b) = x0 − x0 + A−1 b = A−1 b
that is
x1 = x∗ = A−1 b
6.2.2 Line Search Descent Algorithm

Over the last four decades a number of powerful direct search methods have
been developed for the unconstrained minimization of general functions.These
algorithms require an initial estimate to the optimum point, denoted by x0 .
With this estimate as starting point, the algorithm generates a sequence of
estimates x0 , x1 , x2 ,...by successively searching directly from each point in
a direction of descent to determine the next point.The process is terminated
if either no further progress is made, or if a point xk is reached (for smooth
functions) at which the necessary condition G(x) = 0 is sufficiently accurately
satisfied, in which case x∗ = xk .
An important sub-class of direct search methods, specifically suitable for
smooth functions, are the so-called line search descent methods. Basic to these
methods is the selection of a descent direction uk+1 at each iterate xk that
ensures descent at x∗ in the direction uk+1 , that is, it is required that the
directional derivative in the direction uk+1 be negative:
df (xk )

= G(xk )uk+1 < 0
dλ uk+1
General structure of a line search descent method

1. Given starting point x0 and positive tolerances 1 , 2 and 3 , set k = 1.
2. Select a descent direction uk
3. Perform a one-dimensional line search in direction uk : i.e.
min F (λ) = min f (xk−1 + λuk )

λ λ
to give minimizer λk .
4. Set xk = xk−1 + λk uk
5.Test
for convergence:

if xk − xk−1 < 1 , or G(xk ) < 2 , or f (xk ) − f (xk−1 ) < 3
then STOP and x∗ ' xk
else go to Step 6.
6.Set k = k + 1 and go to Step 2.
6.2.3 The Method of Steepest Descent

Line search descent methods (see Section 6.1.1), that use the gradient
vector G(x) to determine the search direction for each iteration, are called
first order methods because they employ first order partial derivatives of f (x)
to compute the search direction at the current iterate. The simplest and most
famous of these methods is the method of steepest descent, first proposed by
Cauchy in 1847.
In this method the direction of steepest descent is used as the search
direction in the line search descent algorithm given above. The expression for
the direction of steepest descent is derived below. At x0 we seek the unit vector
u such that for F (λ) = f (x0 + λu), the directional derivative

df (x) dF (0)
= = G(x)u
dλ u dλ
assumes a minimum value with respect to all possible choices for the unit
G(x0)
vector u at x0 . Clearly for the particular choice u = − the directional
kG(x0)k
0
dF (0) G(x )
derivative at x0 is given by = −G(x) = − kG(x0 )k = least
dλ kG(x0 )k
value.
Thus this particular choice for the unit vector corresponds to the direction
of steepest descent. The search direction
G(x0 )
u=−
kG(x0 )k
is called the normalized steepest descent direction at x.

Steepest descent algorithm
Given x◦ , do for iteration i = 1, 2, ... until convergence:
G(xk−1 )
1. uk = −
kG(xk−1 )k
2. xk = xk−1 + λk uk where λk is such that
F (λk ) = f (xk−1 + λk uk ) = min f (xk−1 + λk uk ).

λ
6.2.4 Conjugate Gradient Method

In spite of its local optimal descent property the method of steepest descent
often performs poorly. There is, however, a class of first order fine search
descent methods, known as conjugate gradient methodsˆfor which it can be
proved that a method from this class will converge exactly in a finite number
of iterations when applied to a positive-definite quadratic function of the form
1 T
f (x) = x Ax − bT x + a
2
where A is a positive-definite nxn real symmetric matrix.
Mutually conjugate directions
Two non zero vectors u, v are defined to be orthogonal if the scalar product
uT v = (u, v) = 0. Two non zero vectors u, v are said to be A-conjugate if
uT Av = (u, Av) = 0.
The Fletcher-Reeves directions

These are defined as follows:
u1 = −G x0

and for k = 1, 2, ..., n − 1
uk+1 = G xk + βk uk

where xk = xk−1 + λk uk , and λk corresponds to the optimal descent step in

iteration k, and
G(xk ) 2

βk = 2 (6.18)
kG(xk−1 )k
Note that the above directions can be shown to be A-conjugate.
Fletcher-Reeves conjugate gradient algorithm for general functions
Given x0 perform the following steps:
1. Compute G(x0 ) and set u1 = −G(x0 ).
2. For k = 1, 2, ..., n do:
2.1 xk = xk−1 + λk uk where λk is such that
f (xk−1 + λk uk ) = minf (xk−1 + λk uk )

λ
2.2 Compute G(xk ).

2.3 If convergence criteria satisfied, then STOP and x∗ ' xk , else go to
step 2.4.
2.4 If 1 ≤ k ≤ n − 1, uk+1 = −G(xk ) + βk uk with βk given by (6.18).
3. Set x0 = xk and go to step 2 (restart).
Example 244. Apply the Fletcher-Reeves method to minimize

1 2
f (x) = x + x1 x2 + x22
2 1
with x0 = [10, −5]T .
Solution: Iteration 1:

x1 + x2
G(x) =
x1 + 2x2
and therefore
1 0 −5
u = −G(x ) =
0
and

10 − 5λ
x1 = x◦ + λu1 =
−5
and
1 2
F (λ) = f (x0 + λu1 ) = (10 − 5λ) + (10 − 5λ) (−5) + 25
2
For optimal descent

dF df (x)
(λ) = = −5 (10 − 5λ) + 25 = 0
dλ dλ u1
This gives
1 5 1 0
λ1 = 1, x = and G(x ) =
−5 −5
Iteration 2:
G(x1 ) 2

2 1 1 0 25 −5 −5
u = −G(x ) + u =− + =
kG(x0 )k
2 −5 25 0 5

2 1 2 5 −5 5(1 − λ)
x = x + λu = +λ =
−5 5 −5(1 − λ)
and
1
F (λ) = f (x1 + λu2 ) = 25(1 − λ)2 − 50(1 − λ)2 + 50(1 − λ)2

2
Again for optimal descent

dF df (x)
(λ) = = −25(1 − λ) = 0.
dλ dλ u2
This gives
2 0
λ2 = 1, x = and G(x2 ) = 0
0
Therefore STOP.
6.3 Linear Programming

.
The term linear programming (LP) does not refer to the programming
of a computer but to the programming of business or economic enterprises.
An explicit and technical meaning has been assigned to the term. It means
finding the maximum value of a linear function of n variables over a convex
polyhedral set in Rn .
The goal of an LP problem is to maximize or minimize (optimize) a linear
objective function f in n decisions variables x1 , x2 , ..., xn subject to a set of
m constraints.
Let c = (c1 , c2 , ..., cn )T ∈ Rn , b = (b1 , ..., bm )T ∈ Rm , and A = (aij ) is an
m × n matrix. This is called a standard maximization problem if it has
the form: Find the maximum of
f = c1 x1 + c2 x2 + ... + cn xn (6.19)
subject to the constraints
a11 x1 + a12 x2 + ... + a1n xn ≤ b1 (6.20)

a21 x1 + a22 x2 + ... + a2n xn ≤ b2
.
..
am1 x1 + am2 x2 + ... + amn xn ≤ bm
where the decision variables x1 , x2 , ..., xn are non-negative. In matrix form,

find the maximum of
f = cT x, x ∈ Rn , (6.21)
subject to the constraint
Ax ≤ b, and x ≥ 0
We recall that if x = (x1 , ..., xn )T , then the vector inequality x ≥ 0 means

that xi ≥ 0 for all i = 1, 2, ..., n. Likewise, the inequality Ax ≤ b means that
n
X
aij xj ≤ bi for all i = 1, 2, ..., m (6.22)
j=1
Next we describe some common terminology. The feasible set in our

problem is the set
K = {x ∈ Rn : Ax ≤ b, x ≥ 0} . (6.23)
The value of the problem is the number
v = sup cT x : x ∈ K .

(6.24)
A feasible point is any element of K. A solution or optimalPfeasible point

n
is any x ∈ K such that cT x = v. The function x → cT x = j=1 cj xj is the
objective function. Since the problem is completely determined by the data
A, b, and c, we refer to it as linear programming problem (A, b, c).
TABLE 6.1
P1 P2 Time Constraints (hr)

M1 1 2 ≤ 40
M2 1 2 ≤ 30
Example 245. A company makes products P1 and P2 using two machines

M1 and M2 . Producing one unit of P1 requires one hour on M1 and M2 or
four hour in total. Producing one unit of P2 requires two hour on each of M1
and M2 . Due to the need for regular servicing, M1 and M2 can only run for
30 and 40 hours per week, respectively. The preceding data are organized in
the Table 6.1.
The company expects net profits of $20 and $30, respectively, on each unit
of P1 and P2 sold. Write down the LP problem How should the company
schedule production in order to maximize net weekly profit?
Solution: The model translates directly into linear programming problem in
standard form with n = m = 2 in (6.19) and (6.20).Let x1 and x2 represent,
respectively, the number of units of products P1 and P2 produced per week.
The company’s objective is to maximize
f = 20x1 + 30x2 (net profit per week)
such that
x1 + 2x2 ≤ 40 (hours, M1 )
x1 + x2 ≤ 30 (hours, M1 ).
6.4 Simplex Method

The simplex method was invented by G.B. Dantzig in 1949. His mono-
graph (Dantzig 1963) is the classical reference. Most texts describe the sim-
plex method as a sequence of pivots on a table of numbers called the simplex
tableau.
We shall describe an algebraic step-by-step procedure, called the simplex
algorithm, that can be applied to solve LP problems with any number of
decisions and constraints. Let us first illustrate this algorithm by the following
linear programming (LP) problem: Maximize the objective function
f = 6x1 + 14x2
TABLE 6.2
6 14 0 0 0 0
2 1 1 0 0 12
2 3 0 1 0 15
1 7 0 0 1 21
0 0 12 15 21
subject to the constraint



 2x1 + x2 ≤ 12
2x1 + 3x2 ≤ 15


 x1 + 7x2 ≤ 21
x1 ≥ 0, x2 ≥ 0

In preparation for the simplex algorithm, we introduce slack variables

and rewrite the problem as follows:
Maximize: F = 6x1 + 14x2 + 0x3 + 0x4 + 0x5



 2x1 + x2 + x3 = 12
2x1 + 3x2 + + x4 = 15

 x1 + 7x2 + .... + x5
 = 21
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0

Hence, in matrix form
Maximize: F = (6, 14, 0, 0, 0, )T

   
 2 1 1 0 0 12
 2 3 0 1 0  x =  15 
1 7 0 0 1 21

x = (x1 , x2 , x3 , x4 , x5 )T ≥ 0
Our first vector will be x = (0, 0, 12, 15, 21)T . All these data are summarized
in the first tableau
Every step of the simplex method begins wit a tableau. The top row con-
tains coefficients that pertain to the objective function F . The current value
of F (x) = cT x is displayed in the top right corner. The next m rows in the
tableau represent a system of equations embodying the quality constraints.
TABLE 6.3
cT 0 F (x)
A I b
x (nonbasic) x (basic)
It is worth reminding that elementary row operations can be performed on

this system of equations without altering the set of solutions. The last row of
the tableau contains the current x-vector. Notice that F (x) = cT x ids easily
computed using the top row and the bottom row. The preceding tableau id of
the form
6.4.1 Tableau Rules

Each tableau that occurs in the simplex method must satisfy these five
rules:
1. The x-vector must satisfy the equality constraint Ax = b.
2. The x-vector must satisfy the inequality constraint x ≥ 0.
3. There are n components of x (called nonbasic variables) that are
0. The remaining m components are usually nonzero and are called basic
variables.
4. In the matrix that defined the constraints, each basic variable occurs in
only one row.
5. The objective function F must be expressed only in terms of nonbasic
variables.
In each step, we examine the current tableau to see whether the value
of F (x) can be increased by allowing a nonbasic variable to become a basic
variable In our example, we see that if we allow x1 or x2 to increase (and com-
pensate by adjusting x3 , x4 , x5 ), then the value of F (x) will indeed increase.
Because the coefficient 14 in F is greater than the coefficient 6, a unit increase
in x2 will increase F faster than a unit increase in x1 . Hence, we hold x1 fixed
at 0 and allow x2 to increase as much as possible. These constraints apply:
0 ≤ x3 = 12 − x2
0 ≤ x4 = 15 − 3x2
0 ≤ x5 = 21 − 7x2
These constraints tell us that
x2 ≤ 12 x2 ≤ 15 x2 ≤ 3
the most stringent of these is the inequality x2 ≤ 3, and therefore x2 is allowed

TABLE 6.4
6 0 0 0 −2 42
13
7 0 0 1 − 17 9
11
7 3 0 1 − 37 6
1 7 0 0 1 21
0 3 9 6 0
to increase to 3. The resulting values of x3 , x4 , and x5 are obtained by the

three given constraints. Hence, our new x-vector is
T
x= 0 3 9 6 0
The new basic variables are x2 , x3 , and x4 , and we must now determine the
next tableau in accordance with the preceding five rules. In order to satisfy
Rule 5, we note that x2 = (21 − x5 )/7. When this is substituted in F , we find
a new form for the objective functions:
F = 6(x1 + 14x2 = 6x1 + 14(21 − x5 )/7 = 6x1 − 2x5 + 42
To satisfy Rule 4, Gaussian elimination steps (elementary row operations)

are applied, using 7 as the pivot element. The purpose of this is to eliminate
x2 from all but one equation. After all this work has been carried out, Step 1
is finished abd Step 2 begins with the second tableau, which is
The situation presented now is similar to that of the beginning. The nonbasic
variables are x1 and x5 . Ant increase in x5 will decrease F (x), and so it is x1
that is now allowed to become a basic variable. Hence, we hold x5 fixed at 0
and allow x1 to increase as much as possible. These constraints apply:
0 ≤ x3 = 9 − 13

7 x2
0 ≤ x4 = 6 − 11

7 x2
0 ≤ 7x2 = 21 − x1
These lead to
63 42
x1 ≤ x1 ≤ x1 ≤ 21
13 11
The new basic variable x1 is allowed to increase to only 42/11, and new values
x3 , x4 , and x5 are computed from tableau or from the constraints equations
immediately above. The new x-vector is
T
x = 42/11 27/11 21/2 0 0
The nonbasic variables are now x4 , and x5 . To satisfy Rule 5, we use the
substitution x1 = (7/11)(6 − x4 ). then
F (x) = 6x1 − 2x5 + 42

= (42/11) (6 − x4 ) − 2x5 + 42
− (42/11) x4 − 2x5 + 714/11
It is not necessary to complete the third tableau because both coefficients in

F are negative. This means that the current x is a solution because neither
of the the variables x4 , and x5 can become basic variables without decreasing
F (x). Thus, in the original problem, the maximum value if F (42/11, 27/11) =
630/11.
Summary
On the basis of this example and the explanation, we can summarize the
work to be done on any given tableau as follows:
1. If all coefficients in F (that is, the top row in the tableau) are ≤ 0, then
the current x is the solution.
2. Select the nonbasic variable whose coefficient in F is positive and as
large as possible. This becomes a new basic variable. Call it xj .
3. Divide each bi by the coefficient of the new basic variable in that row,
aij . The value assigned to the new basic variable is the least of ratios. Thus,
if bk /akj , we set xj = bk /akj .
4. Using pivot element akj , create 00 s in column j of A with Gaussian
elimination steps.
6.5 Complementarity Problems

Complementarity is a new domain of applied mathematics, having deep
relations with several aspects of fundamental mathematics and numerical anal-
ysis. Complementarity problems represent a wide class of mathematical mod-
els related to optimization, economics, engineering, mechanics, elasticity, fluid
mechanics and game theory.
Complementarity problems arise naturally in the study of many phenom-
ena in economics and engineering. A comprehensive and excellent treatment
of applications of complementarity problems is provided in [8] and [14].
6.5.1 Problem Statement

A linear complementarity problem consists of finding a vector in a finite-
dimensional real vector space that satisfies a certain system of inequalities.
More precisely, given a vector q ∈ Rn and a matrix M ∈ Rn×n , the linear

complementarity problem (LCP), is to find a vector z ∈ Rn such that

 z≥0
q + Mz ≥ 0 (6.25)
 T
z (q + M z) = 0
or equivalently
Find w = (wj ) ∈ Rn , z = (zj ) ∈ Rn
satisfying 
 w − Mz = q
w, z ≥ 0 (6.26)
 T
w z=0
We denote the above LCP by the pair (q, M ). The name comes from the
third condition, the complementarity condition which requires that at least
one variable in the pair (wj , zj ) should be equal to 0 in the solution of the
problem, for each j = 1 to n.
Example: The Obstacle Problem
The obstacle problem consists of finding the equilibrium position of an
elastic membrane that is held at a fixed position on its boundary and which
lies over an obstacle.
Consider stretching an elastic string fixed at the endpoint (0, 0), and (0, 4)
over an obstacle defined by a function ψ (in this example, we use ψ(x) =
1 − (x − 2.2)2 . Notice that the position of the string will be defined by ψ(x)
for x between unknown points P and Q. , and that in the intervals 0 ≤ x ≤ P ,
and Q ≤ x ≤ 4, the string lie along straight line segments connecting (0, 0) to
(P, ψ(P )) and (Q, ψ(Q)) to (4, 0), respectively. If we represent the equilibrium
position of the string by the function u, then u must satisfy the following
conditions.
This representation of the problem is complicated by the presence of the
free boundaries P and Q. The complementarity framework allows a simpler
representation, which does not require free boundaries. First, note that since
there is no downward force on the string, u00 (x) ≤ 0 for all x, except possibly at
x = P or x = Q where u00 may be discontinuous. Also, note that u(x) ≥ ψ(x)
everywhere. Finally, at each point x, either u00 (x) = 0 or u(x) = ψ(x). Thus,
if we ignore momentarily the discontinuity of u00 at P and Q, we see that u
must satisfy the conditions

 u(x) ≥ ψ(x), 0 ≤ x ≤ 4
 00

u (x) ≤ 0
(6.27)

 (u(x) − ψ(x))u00 (x) = 0
u(0) = u(4) = 0.

This system can be solved numerically using a finite difference or finite

element scheme. For example using a central scheme on a regular mesh with
step size h = 4/n, u is approximated by the vector u = (u0 , u1 , ..., un ), where

ui = ψ(xi ), xi = x0 + ih, i = 0, ..., n, and x0 = xn = 0. The above system is
then approximated by
u0 = un = 0
ui−1 − 2ui + ui+1
≤0 i = 1, ..., n − 1
h2
ui − ψ(xi ) ≥ 0

ui−1 − 2ui + ui+1
(ui − ψ(xi )) = 0.
h2
Using the simple change of variables zi = ui − ψ(xi ), this system is equivalent
to the linear complementarity problem (q, M ), where M is an (n − 1) × (n − 1)
tridiagonal matrix and q is an (n − 1)-vector defined by
 
2 −1 0 0 ... 0
 −1 2 −1 0 
 
 0 −1 2 −1 0 
M =  ,
 . . . . . 
 . . 2 −1 
0 −1 2
 
−2ψ(x1 ) + ψ(x2 )

 ψ(x1 ) − 2ψ(x2 ) + ψ(x3 ) 

 
q=  

 
 ψ(xn−3 ) − 2ψ(xn−2 ) + ψ(xn−1 ) 
ψ(xn−2 ) − 2ψ(xn−1 )
The columns z = (z1 , ..., zn−1 ) of this LCP then gives the discrete approxi-
mation to u at the interior points by the relation ui = zi +ψ(xi ), i = 1, ..., n−1.
6.6 Variational Inequalities

Variational inequality theory was introduced by Hartman and Stampac-
chia (1966) as a tool for the study of partial differential equations with ap-
plications arising from mechanics. Such variational inequalities were infinite-
dimensional rather than finite-dimensional as we describe in this section. The
breakthrough in finite-dimensional theory of variational inequalities occurred
in 1980 when Dafermos established that the traffic network equilibrium con-
ditions of Smith (1979) contained a formulation of a variational inequality.
This gave birth to a new methodlogy for the study of problems in economics,
management science, operation research, and also in engineering with a focus

on transportation. Variational inequality theory is a powerful tool for formu-
lating a variety of equilibrium problems. It contains, as special cases, such
well-known problems in mathematical programming as systems of non-linear
equations, optimization problems, complementarity problems.
6.6.1 Variational Inequality Problem

The finite-dimensional variational inequality problem consists of seeking
x∗ ∈ K ⊂ Rn , such that
F (x∗ )T .(x − x∗ ) ≥ 0 ∀x ∈ K (6.28)
or equivalently
F (x∗ )T , (x − x∗ ) ≥ 0 ∀x ∈ K

(6.29)
where K is a closed convex set, F is a given continuous function from K to
Rn , and h., .i denotes the inner product in n dimensional Euclidean space.
In geometric terms, the variational inequality (6.28) states that F (x∗ )T
is orthogonal to the feasible set K at the point x∗ . This formulation is par-
ticularly convenient because it allows for a unified treatment of equilibrium
problems and optimization problems. Indeed, many mathematical problems
can be formulated as variational inequality problems. Some examples are given
below.
6.6.2 Systems of Equations

Many classical economic equilibrium problems have been formulated as
systems of equations, since market clearing conditions necessarily equate the
total supply with total demand. In terms of a variational inequality problem,
the formulation of a system of equations is as follows.
Let K = Rn and let F from Rn into itself be a function. A vector x∗ ∈ Rn
solves VI(F, Rn ) if and only if F ( x∗ ) = 0.
Indeed, If F ( x∗ ) = 0, then (6.29) holds with inequality. Conversely, if x∗
satisfies (6.29), let x = x∗ − F (x∗ ). Then
2
F (x∗ )T , (−F (x∗ ) ≥ 0 or − kF (x∗ )k ≥ 0

and, therefore, F (x∗ ) = 0.
6.6.3 Optimization and Variational Inequalities

Both unconstrained and constrained optimization problems can be for-
mulated as variational inequality problems. Next we show the relationship
between an optimization problem and a variational inequality problem.
Proposition 1. Let x∗ be a solution to the optimization problem:
Minimize f (x) (6.30)
subject to: x ∈ K
where f is a continuously differentiable function and K is closed and convex.

Then x∗ is a solution of the variational inequality problem
∇f (x∗ )T .(x − x∗ ) ≥ 0 ∀x ∈ K (6.31)
Proof. Let φ(t) = f (x∗ + t(x − x∗ )), for t → [0, 1]. Since φ(t) achieves its
minimum at t = 0, 0 ≤ φ0 (0) = ∇f (x∗ )T .(x − x∗ ), that is, x∗ is a solution of
(6.31).
Proposition 2. If f (x) is a convex function and x∗ is a solution to VI

(∇f, K), then x∗ is a solution to the optimization problem (6.30).
Proof. Since f (x) is convex,
f (x) ≥ f (x∗ ) + ∇f (x∗ )T .(x − x∗ ), ∀x ∈ K
But ∇f (x∗ )T .(x − x∗ ) ≥ 0, since x∗ is a solution to VI(∇f, K). Therefore,

from (6.31) one concludes that
f (x) ≥ f (x∗ ), ∀x ∈ K
that is, x∗ is a minimum point of the optimization problem (6.30).
If the feasible set K = Rn , then the unconstrained optimization prob-

lem is also a variational inequality problem. On the other hand, in the case
where a certain symmetry holds, the variational inequality problem can be
reformulated as an optimization problem.
6.7 Queuing Theory

Recall the last time that you had to wait at a supermarket checkout
counter, wait a teller at your local bank, or wait to be served at a restau-
rant. In these and many other waiting line situations, the time is undesirable.
Since adding more checkout clerks, more bank tellers, or more waiters is not
always the most economical strategy for improving service, businesses are try-
ing harder to understand their waiting line characteristics and find ways to
keep waiting times within tolerable limits.
Indeed, for instance, there was a time when airline terminal passengers
formed separate queues in front of check-in-counters. But now we see invari-
ably only one line feeding into several counters. This is the result of the real-
ization that a single line policy serves better for the passengers as well as the
airline management.
Such a conclusion emanated from analyzing the mode by which a queue is
formed and a service is provided. The analysis is based on building a math-
ematical model representing the process of arrival of passengers who join the
queue, the rules by which they are allowed into service, and the time it takes
to serve the passengers. In management science terminology a waiting line is
referred to as a queue, and the body of knowledge dealing with waiting lines is
known as queuing theory. In the early 1900s A.K. Erlang, a Danish telephone
engineer, began a study of congestion and waiting completion of telephone
calls. Since then, queuing theory has grown far more sophisticated and has
been applied to wide variety of waiting line situations.
We identify the unit demanding service, whether it is human or otherwise,
as the customer. The unit providing service is known as the server .This
terminology of customers and servers is used in a generic sense regardless of
the nature of the physical context. Some examples are given below:
(a) In communication systems, voice or data traffic queue up for lines for
transmission. A simple example is the telephone exchange.
(b) In a manufacturing system with several work stations, units completing
work in one station wait for access to the next.
(c) Vehicles requiring service wait for their turn in garage.
(d) Patients arrive at a doctor’s clinic for treatment.
Numerous examples of this type are everyday occurrence. While analyz-
ing them we can identify some basic elements of the systems. These are the
following:
• Input Process. In most cases the arrival are products of external factors.
Therefore, input is described in terms of random variables that can
represent either the number arriving during a time interval or the time
interval between successive arrivals.
• Service mechanism. The uncertainties involved in the service mechanism

are the number of servers, the number of customers served at any time,
and the duration and mode of service. Networks of queues consist of more
than one server arranged in series and/or parallel. Random variables
represent service times, and the number of servers, when appropriate.
• System capacity. The number of customers who wait at a time in a

queueing system is an important factor for consideration. If the waiting
room is large, one can assume that, for all piratical purposes, is infinite.
• Queue discipline. Servers follow rule in accepting customers for service.

In this context, rules such as “first come, first served” (FCFS), “last
come, first served” (LCFS), and “random selection for service” (RS) are
self explanatory.
The identification of these elements leads to symbolically representing

queueing systems with a variety of system elements. The basic representa-
tion widely used in queueing theory is made up of symbols representing three
elements, input/service/number of servers. For instance, we use M for Poisson
or exponential, D for deterministic (constant), Ek for the Erlang distribution
with scale parameter k, and G for general (also GI for general independent).
• Customer waiting time. From a customer view, time spent in the queue
and in the system are two characteristics of importance.
Let Tq and T be the time a customer spends in queue and in the system,
respectively. We assume that the system operates according to a “first come,
first served” (FCFS) queue discipline. With an FCFS queue discipline, the
waiting time for service (Tq ) of an arriving customer is the amount of time
required to serve the customers already in the system. The total time T in
the system is Tq plus the service time.
The ratio of arrival rate to service rate plays a significant role in measuring
the performance of queueing systems.
arrival rate
ρ = traffic intensity = .
service rate
6.7.1 Queue
M/M/1
The M/M/1 is the simplest of the queuing models used in practice.
1. The waiting line has a single channel.

2. The number of customers is finite.
3. The pattern of arrivals follows a Poisson process with mean arrival λ.
4. The service time follows an exponential probability distribution with

mean service rate µ.
5. The queue discipline is first-come, first served (FCFS).
Let Q(t) be the number of customers in the system, Qq be the number of

customers in the queue, excluding the one in service. Then, estimated number
of customers in the system is given by:
λ
L = E(Q) = (6.32)
µ−λ
and the estimated number of customers in the queue
λ2
Lq = E(Qq ) = (6.33)
µ (µ − λ)
Now, let Tq and T be the time a customer spends in queue and in the
system respectively. We assume that the system operates according to a “first
come, first served” (FCFS) queue discipline. With an FCFS queue discipline,
the waiting time for service (Tq ) of an arriving customer is the amount of time
required to serve the customers already in the system. The total time T in the
system is Tq plus the service time. When there are n customers in the system,
since service times are exponential with parameter µ, the total service time
of n customers is Erlang with probability density
µn xn−1
fn (x) = e−µx .
(n − 1)!
Let Fq (t) = P (Tq ≤ t) be the distribution function of the waiting time Tq .

Then, we have
Fq (t) = 1 − ρe−µ(1−ρ)t (6.34)
Let now E(Tq ) = Wq be the estimated waiting time in the the queue, then,
ρ λ
Wq = E(Tq ) = =
µ (1 − ρ) µ (µ − λ)
And, since the total time in the system, T, is the sum of Tq and the service
time, we get
λ 1 1
W = E(T ) = + = (6.35)
µ (µ − λ) µ µ−λ
Combining the result (6.32) and (6.35), we note the relationship
L = λW (6.36)
A similar comparison between results (6.38) and (6.39) establishes
Lq = λWq (6.37)
The formula (6.36) is known as Little’s law in queueing literature.

Example 246. An airport has a single runway. Airplanes have been found
to arrive at the rate of 15 per hour and it is estimated that each landing
takes 3 minutes. Assuming a Poisson process for arrivals and an exponential
distribution for landing times, we use the M/M/1 model to determine the
following performance measures.
(a) Runway utilization:
arrival rate = 15/hour (λ).
60
service rate = /hour.
3
λ 3
utilization = ρ = = .
µ 4
(b) Expected number of airplanes waiting to land:
2
ρ2 (0.75)
Lq = = = 2.25.
1−ρ 0.25
(c) Expected waiting time:

λ 15 3
E(Wq ) = = = hour = 9 minutes.
µ(µ − λ) 20(20 − 15) 20
(d) Probability that the waiting will be more than 5 minutes, 10 minutes, No
waiting.
P (no waiting) = P (Tq = 0) = 1 − ρ = 0.25.

Since
P (Tq > t) = 1 − P (Tq ≤ t) = ρe−µ(1−ρ)t
then
3 −20(1− 3 )5/60
P (Tq > 5 minutes) = e 4
4
3 − 25
= e 60 = 0.4944
4
and
3 − 50
P (Tq > 10 minutes) = e 60 = 0.3259
4
15
(e) Expected number of landings in 20 minutes period = × 20 = 5.
60
6.7.2 Queue
M/M/s
The multiserver queue M/M/s is the most used model. It aims at an-
alyzing service stations with more than one server such as banks, checkout
counters in stores, check-in counters in airports. The arrival of customers is
assumed to follow a Poisson process, service times are assumed to have an
exponential distribution, and the number of servers is s. These servers are as-
sumed to provide service independently of each other. The arriving customers
are assumed to form a single queue and the one at the head of the waiting
line enters the service as soon as a server is free.
Let λ be the arrival rate and µ the service rate. Denoting by L and Lq
the mean number of customers in the system and the number in the queue,
λ λ
respectively, and setting ρ = and α = following (6.32), we get
sµ µ
ραs p0
L=α+
s!(1 − ρ)2
ραs p0
Lq = (6.38)
s!(1 − ρ)2
where "s−1 #−1
X αr αs α
p0 = + 1− .
r=0
r! s! s
The estimated waiting time in the the queue, will be
α s p0
Wq = E(Tq ) = . (6.39)
s!sµ(1 − ρ)2
Comparing (6.38) with (6.39), we have again Little’s law formula
Lq = λWq .
Let Tq be the waiting time and Fq (t) = P [Tq ≤ t] be its distribution function.
Then
α s p0
Fq (t) = 1 − e−sµ(1−ρ)t .
s!(1 − ρ)
Example 247. In the airport problem of example, how would the perfor-
mance measures change if there are two runways while assuming the same
arrival and service rates?
(a) Runway utilization:
arrival rate = 15/hour (λ),
service rate = 20/hour (µ),
number of servers = 2(s)
λ 3
utilization of each runway = ρ = =
sµ 8
(b) Expected number of airplanes waiting to land.

ραs p0
Lq =
s!(1 − ρ)2
with
"s−1 #−1
X αr αs α
p0 = + 1−
r=0
r! s! s
"
3 2
−1 #−1
3 3
= 1+ + 4 1− = 0.4545
4 2 8
h i
3 3 2
8 4 (0.4545)
Lq = 2 = 0.1227.
2 58
(c) Expected waiting time.
h i
3 2

s
α p0 4 (0.4545)
Wq = = 2
s!sµ(1 − ρ)2 2 × 2 × 20 1 − 38
= 0.00818 hour = 0.49 minute.
(d) Probability that the waiting time will be more than 5 minutes? 10
minutes? No waiting?
αs p0
P (no waiting) = Fq (0) = 1 −
s!(1 − ρ)
3 2

4(0.4545)
=1−
2(1 − 3/8)
= 0.7955.
α s p0
P (Tq > t) = e−sµ(1−ρ)t .
s!(1 − ρ)
3 2

4 (0.4545) −2( 1 )( 5 )5
P (Tq > 5 minutes) = e 3 8
2(5/8)
= 0.1245.
P (Tq > 10 minutes) = 0.0155.

15
Expected number of landings in a 20-minutes period = 60 × 20 = 5.
6.8 Iterative Methods and Pre-conditioning

6.8.1 Norms of Vectors and Matrices
Definition 85. A vector-norm on RN is a function k.k, from RN into R with
the following properties:
(i) kxk ≥ 0 for all x ∈ RN

(ii) kxk = 0 if and only if x = 0
(iii) kαxk = |α| kxk for all α ∈ R and x ∈ RN
(iv) kx + yk ≤ kxk + kyk
Definition 86. The most used norms on RN are the l2 and l∞ norms. For
x = (x1 ,x2 , ..., xn )t are defined by
n
( n )1/2
X X
kxk∞ = max |xi | , kxk1 = |xi | , kxk2 = x2i
1≤i≤n
i=1 i=1
Example 248. The vector x = (−1, 1, −2) in R3 has norms

p √
kxk2 = (−1)2 + (1)2 + (−2)2 = 6
kxk∞ = max {|−1| , |1| , |−2|} = 2
n
X
kxk2 = |xi | = |−1| + |1| + |−2| = 4
i=1
Definition 87. on the set of n × n matrices is a real-values function, k.k,

defined on this set, satisfying for all n × n matrice A and B and all real
numbers α :
kAk ≥ 0.
kAk = 0 if and only if A = 0, the matrix with 0 entries.
kA + Bk ≤ kAk + kBk .
kABk ≤ kAk kBk .
The only norms we consider are those that are natural consequences of the
vector norms l2 and l∞ . Let A = (aij ) be a These are n × n matrix, then the
l∞ norm of A is
n
X n
X q
kAk∞ = max |aij | , kAk1 = max |aij | , kAk2 = ρ (AT A)
1≤i≤n 1≤j≤n
j=1 i=1

where ρ AT A is the largest eigenvalue of the positive and definite matrix
AT A. If A is a real symmetric and positive definite matrix, then kAk2 =
|λmax | = ρ (A)
Example 249. Compute the l∞ , l1 , and l2 norms of
 
1 1 0
A= 1 2 1 
−1 1 2
Solution:
3
X
|a1j | = |1| + |1| = 2,
j=1
3
X
|a2j | = |1| + |2| + |1| = 4,
j=1
3
X
|a3j | = |−1| + |1| + |2| = 4
j=1
3
X
kAk∞ = max |aij | = 4
1≤i≤3
j=1
3
X
|ai1 | = |1| + |1| + |−1| = 3,
i=1
3
X
|ai2 | = |−1| + |2| + |2| = 4,
i=1
3
X
|ai3 | = |1| + |2| = 3
i=1
3
X
kAk1 = max |aij | = 4
1≤j≤3
i=1
    
1 1 −1 1 1 0 3 2 −1
AT A =  1 2 1  1 2 1 = 2 6 4 
0 1 2 −1 1 2 −1 4 5
To calculate |λmax | we need the eigenvalues of AT A. If
det(AT A − λI) = 0
or
 
3−λ 2 −1
det  2 6−λ 4 
−1 4 5−λ
= −λ3 + 14λ2 − 42λ = −λ(λ2 − 14λ + 42) = 0
Then √
λ = 0 or λ = 7 ± 7,
so
√ √ o √
q r n q
kAk2 = ρ (AT A) = max 0, 7 − 7, 7 + 7 = 7 + 7 ≈ 3.108
6.8.2 General Iterative Method

In this section we describe the Jacobi and the Gauss-Seidel iterative meth-
ods, classic methods that date to the late eighteenth century. Iterative tech-
niques are rarely used for solving linear systems of small dimension since the
time required for sufficient accuracy exceeds that required for direct tech-
niques such as Gaussian elimination. For large systems, however, these meth-
ods are efficient in terms of both computer storage and computation. System
of this type arises frequently in circuit analysis and in the numerical solution
of boundary-value problems and partial-differential equations.
An iterative technique to solve the n × n linear system
Ax = b (6.40)
starts with an initial approximation

(k) ∞ x(0) to the solution x and generates a
sequence of vectors x k=0
that converges to x. Iterative technique involve
a process that converts the system Ax = b into an equivalent system of the
form x = T x + c for some fixed matrix T and vector c. After the initial vector
x(0) is selected, the sequence of approximate solution vectors is generated by
computing
x(k+1) = T x(k) + c, k = 0, 1, 2, ... (6.41)
Convergence analysis of iterative methods for both linear and nonlinear
systems is based on the following important from mathematical analysis,
known as the Banach Fixed Point Theorem or Contraction Mapping Theo-
rem for a proof of this theorem we refer to [18] or [19].
Theorem 55. Let K be a closed and bounded subset of Rn , equipped with
a norm k.k. Suppose that f : K → K is a function such that there is a positive
constant q < 1 with the property
kf (x) − f (y)k ≤ q kx − yk (6.42)

for all x, y ∈ K. Then, there is a unique point x∗ ∈ K such that x∗ = f (x∗ ).

(k)
Moreover, the sequence x k≥0
, defined recursively by
x(k+1) = f (x(k) ), k = 0, 1, 2, ... (6.43)
converges to x∗ for any initial guess x(0) ∈ K.

Theorem 56. The following
inequalities
hold and describe the rate of con-
vergence of the sequence x(k) k≥0 :
q
x − x∗ ≤
(k) (k)
x − x(k−1) , k = 0, 1, 2, ... (6.44)

1−q
or equivalently
qk
x − x∗ ≤
(k) (1)
x − x(0) , k = 0, 1, 2, ... (6.45)

1−q
For a proof of this theorem we refer to [18] and [19].

Next we describe some iterative methods for solving a linear system.
6.8.3 Jacobi Iterative Method

We shall describe this method by the following example
Example 250. Use the Jacobi method to solve the linear system
(e1 )10x1 − x2 + 2x3 = 6 (6.46)

(e2 ) − x1 + 11x2 − x3 + 3x4 = 25
(e3 ) 2x1 − x2 + 10x3 − x4 = −11
(e4 ) 3x2 − x3 + 8x4 = 15
which has the unique solution x = (1, 2, −1, 1)t .

Solution: We first convert Ax = b to the form solve equation (ei ) for xi , for
each i = 1, 2, 3, 4, to obtain
1 1 3
x1 = x2 − x3 +
10 5 5
1 1 3 25
x2 = x1 + x3 − x4 +
11 11 11 11
1 1 1
x3 = − x1 + x2 + x4 = −11
5 10 10
3 1 15
x4 = − x2 + x3 +
8 8 8
TABLE 6.5
k 0 1 2 3 4 5
(k)
x1 0.000 0.6000 1.0473 0.9326 1.0152 0.9890
(k)
x2 0.000 2.2727 1.7159 2.053 1.9537 2.0114
(k)
x3 0.000 −1.1000 −0.8052 −1.0493 −0.9681 −1.0103
(k)
x4 0.000 1.8750 0.8852 1.1309 0.9739 1.0214
k 6 7 8 9 10
(k)
x1
(k)
x2
(k)
x3
(k)
x4 0.9944 1.0036 0.9989 1.0006 0.9998
by solving for xi in each row. That is, for an initial guess x(0) = (0, 0, 0, 0)t
we compute the iterate x(1) as follows
(k) 1 (k−1) 1 (k−1) 3

x1 = x2 − x3 +
10 5 5
(k) 1 (k−1) 1 (k−1) 3 (k−1) 25
x2 = x + x3 − x4 +
11 1 11 11 11
(1) 1 (k−1) 1 (k−1) 1 (k−1) 11
x3 = − x1 + x2 + x4 −
5 10 10 10
(1) 3 (k−1) 1 (k−1) 15
x4 = − x2 + x3 +
8 8 8
(k) (k) (k) (k)
Some iterates, x(k) = (x1 , x2 , x3 , x4 )t , are displayed in Table the follow-
ing table.
Solution: The decision to stop after ten iterations was based on the criterion
(10)
x − x(9) ∞ 8.0 × 10−4

x(10) = < 10−3
∞
1.9998
The Jacobi iterative method consists of solving the ith equation in Ax = b

for xi to obtain (provided aii 6= 0)
n
X aij xj bi
xi = − + , for i = 1, 2, ..., n (6.47)
j=1
a ii aii
j6=i
(k)
and generating each xi from components of x(k−1) for k ≥ 1 by
(k−1)
n
!
(k)
X aij xj bi
xi = − + , for i = 1, 2, ..., n
j=1
aii aii
j6=i
6.8.4 The Gauss-Seidel Method

A possible improvement of Jacobi Method can be obtained by using the
(k) (k)
components, for i > 1, x1 , ..., xi−1 , already computed, to compute the com-
(k)
ponent xi . That is, we can use
i−1 n
P (k) P (k−1)
− (aij xj ) − (aij xj ) + bi
(k) j=1 j=i+1
xi = , for i = 1, 2, ..., n (6.48)
aii
Example 251. Use the Gauss-Seidel method to solve the linear system (6.46)
solved in example 1 by the Jacobi method.
Solution: Let D be the diagonal matrix whose diagonal entries are those
of A, −L be the strictly lower-triangular part of A, −U be the strictly upper
triangular part of A. With this notation, the equationThe Gauss Seidel method
applied to this system yields
Example 252.
(k) 1 (k−1) 1 (k−1) 3

x1 = x − x3 +
10 2 5 5
(k) 1 (k) 1 (k−1) 3 (k−1) 25
x2 = x + x3 − x4 +
11 1 11 11 11
(1) 1 (k) 1 (k) 1 (k−1) 11
x3 = − x1 + x2 + x4 −
5 10 10 10
(1) 3 (k) 1 (k) 15
x4 = − x2 + x3 +
8 8 8
For x(0) = (0, 0, 0, 0)t , the Gauss- Seidel method generates the iterates in
Table
As (5)
x − x(4) 0.0008
∞ =
x(5) = 4 × 10−4
∞
2.000
x(5) is accepted as a reasonable approximation to the solution. Note that

Jacobi’s method in example 1 required twice as many iterations for the same
accuracy.
Ax = b
TABLE 6.6
k 0 1 2 3 4 5
(k)
x1 0.000 0.6000 1.030 0.0065 1.0009 1.0001
(k)
x2 0.000 2.3272 2.037 2.0036 2.0036 2.0000
(k)
x3 0.000 −0.9893 −1.014 −1.0025 −1.0003 −1.0000
(k)
x4 0.000 0.8789 0.9844 0.9983 0.9999 1.0000
becomes
(D − L − U )x = b
Then one can notice that the Jacobi iterative method is
x(k+1) = T x(k) + c (6.49)
with
T = D−1 (L + U ) and c = D−1 b
and, concretely, in the following example.
1
− 15 3
   
0 10 0 5
 1 0 1
− 3   25 
T = 11 11 11  and c =  5 
 −1 1
0 1   − 11 
5 10 10 10
0 − 38 1
8 0 15
8
while the Gauss-Seidel method is also an iterative method of the form (6.49)
with
−1 −1
T = (D − L) U and c = (D − L) b

Theorem 57. For any x(0) ∈ Rn , the sequence x(k) k≥1 defined in (6.41)
converges to the unique solution of x = T x + c and, therefore, to the solution
of system (6.40), if and only if the spectral radius of T , ρ(T ) < 1.
6.9 Krylov Methods

Let A be a n × n matrix, and v a vector in Rm . A Krylov subspace is a
space of the form
Km (A, v) = Span v, Av, A2 v, ..., Am−1 v

An important property of Krylov subspace is that Km is the space of all

vectors of Rn which can be written as x = p(A)v, where p is a polynomial not
exceeding m − 1.
6.9.1 Arnoldi’s Orthogonalization Method

Arnoldi’s Method is an orthogonal projection method onto Km for general
non-Hermitian matrices. The procedure was introduced in 1951 as a mean of
reducing a dense matrix into Hessenberg form.
Arnoldi’s procedure is an algorithm for constructing an orthogonal basis
for the Krylov subspace Km . Below is one variant of this algorithm.
Algorithm
1. Choose a vector v1 of norm 1 (kv1 k2 = 1).
2. For j = 1, 2, ..., m Do:
3. Compute hij = (Avj ) for i = 1, 2, ..., j
P
4. Compute wj = Avj − hij vj
5. hj+1 , j = kwj k2
6. If hj + 1, j = 0, then stop
7. vj+1 = wj/ hj+1,j
8. EndDo
At each step, the algorithm multiplies the previous Arnoldi vector vj by
A and then orthogonalizes the resulting vector wj against all previous vi0 s by
a standard Gram-Schmidt procedure.
Proposition 3. Denote by Vm , the n × m matrix with column vectors

v1 , ..., vm , by H̄m , the (m + 1) × m Hessenverg matrix whose nonzero en-
tries hij are defined by Arnoldi’s algorithm, and by Hm the matrix obtained
from H̄m by deleting its last row. Then the following relations hold:
AVm = Vm+1 H̄m ,
VmT AVm = Hm . (6.50)
6.9.2 Arnoldi’s Method for Linear Systems

Consider the linear system
Ax = b.
Given an initial guess x0 to this system, and the Krylov subspace
Km (A, r0 ) = r0 , Ar0 , A2 r0 , ..., Am−1 r0

where r0 = b − Ax0 , the Arnoldi’s method seeks an approximate solution xm

from the the affine subspace x0 +Km of dimension m by imposing the so-called
Galerkin condition
b − Axm ⊥ Km .
If v1 = r0 / kr0 k2 in Arnoldi’s method, and set β = kr0 k2 , then, by (6.50)
VmT r0 = VmT (βv1 ) = βe1
which yields
xm = x0 + V m ym
−1
ym = Hm (βe1 ), e1 = (1, 0, ..., 0).
Algorithm
1. Compute r0 = b − Ax0 , β = kr0 k2 , and v1 = r0 /β
2. Define the m × m matrix Hm = (hij ); set Hm = 0
3.For j = 1, 2, ..., m Do:
4. Compute wj = Avj
5. For i = 1, ..., j Do:
6. hij = (wj , vi )
7. wj = wj − hij vj
8. EndDo
9. Compute hj+1 ,j = kwj k2 . If hj+1 ,j = 0 set m = j and GoTo 12
10. Compute vj+1 = wj /hj+1 ,j
11. EndDo
−1
12. Compute ym = Hm (βe1 ) and xm = x0 + Vm ym .
6.9.3 Conjugate Gradient Method

Ax = b (6.51)
where A is symmetric and positive definite. Then it is easily verified that
solving (6.51) is equivalent to minimizing the quadratic functional
1
φ(v) = (Av, v) − (b, v)
2
1
as the minimum value of φ is − (A−1 v, v) − (b, v),and it is attained for x =
2
A−1 b. We also notice that
−∇φ(x) = b − Ax = r
where r is known as the residual vector for x. Note that r = 0 if and only if x is
the solution, and so the size of r measures, in a certain sense, how accurately
x comes to solving the system. Moreover, the residual vector indicates the
direction of steepest decrease in the quadratic function, and is thus a good
choice of direction to head off in search the true minimizer.
The initial result is the gradient descent algorithm, in which each successive
approximation uk to the solution is obtained by going a certain distance in
the residual direction:
xk+1 = xk + αk rk , where rk = b − Axk (6.52)
The conjugate gradient method is one of the best iterative techniques for
solving linear systems
Ax = b
with A, as an n × n Symmetric Positive Definite matrix. The method can
be viewed as an orthogonal projection technique onto the Krylov subspace
Km (r0 , A) where r0 is the initial residual. In the conjugate gradient method
the update of the iterate is obtained from (6.52) by replacing rk with a new
direction pk that is not parallel to that of the gradient. These methods were
originally proposed by Hestenes and Stielfel [20] (see also Hestenes [21]) as
direct methods for symmetric definite linear systems, as (in the absence of
round-off errors) they produce the exact solution in a finite number of steps.
Nowadays, they are implemented as iterative schemes which are capable of
producing very accurate results in a small number of iterations.
In the Conjugate Gradient (CG), method, the directions {pk } are A-
conjugate, that is, they satisfy the orthogonality property (pj , Apm ) = 0 for
m 6= j. In particular
(pk+1 , Apk ) = 0 ∀k = 0, 1, ...
The idea of the method is based on the following remark. Let p0 , ..., pm be lin-
early independent vectors, x0 being an initial guess, and construct the solution
x by successive approximation, with the kth iterate having the form
xk+1 = α1 p1 + ... + αk pk so that xk+1 = xk + αk+1 pk+1
where αk are non-zero real numbers. The idea is not to try to specify the
conjugate basis vectors in advance, but rather to successively construct them
during the course of the algorithm. We begin, merely for convenience, with an
initial guess x0 = 0 for that solution. The residual vector r0 = b − Ax0 = b
indicates the direction of steepest decrease of φ(x) at x0 , and we update the
original guess by moving in this direction, taking p1 = r0 = b as our first
conjugate direction. The next iterate is x1 = x0 + α1 p1 , and we choose the
parameter α1 so that the corresponding residual vector
r1 = b − Ax1 = r0 − α1 p1

is such that r1 is as close to zero as possible. This occurs when r1 is or-
thogonal to r0 , and so we require
T 2 T
0 = r0 r1 = r0 − α1 r0 Ap1
2
= r0 − α1 (r0 , p1 )
2
= r0 − α1 (p1 , p1 ).
Therefore 0 2
r
α1 = 1 1
(p , p )
and so 0 2
r
x = x + 1 1 p1
1 0
(r , r )
is our new approximation to the solution. We can assume that α1 6= 0, since
otherwise the residual r0 = 0. In this case, x0 = 0 would be the exact solution
of the system, and there would be no reason to continue the procedure.
Now let us set p2 = r1 + s1 p1 , where the scalar factor s1 is determined by
the orthogonality requirement
0 = p1 , p2 = r1 + s1 p1 , p1 = r1 , p1 + s1 p1 , p1

so
r1 , p1
s1 = − .
(p1 , p1 )
Now, using the orthogonality of r0 and r1 , we get
T r 0 − r 1

T 1 2
r1 , p1 = r1 Ap1 = r1 = − r1

α1 α1
while
T 1 r0 2
r1 , p1 = p1 Ap1 =

α1
Therefore, the second conjugate direction is
1 2
r
2 1 1
p = r + s1 p , where s1 = 2 .
kr0 k
We then update
x2 = x1 + α2 p2 = x0 + α1 p1 + α2 p2 = α1 p1 + α2 p2
so as to make the corresponding residual vector
r2 = b − Ax2 = r1 − α2 Ap2
as small as possible, which is accomplished by requiring it to be orthogonal

to r1 . Thus, using the A-orthogonality of p1 and p2
T 2 T 2
0 = r1 r2 = r1 − α2 r1 Ap2 = r1 − α2 p2 , p2

and so 1 2
r
α2 = 2 2
(p , p )
Continuing in this manner, at the kth stage, we have already constructed the
conjugate vectors p1 , ..., pk , and the solution approximation xk as a suitable
linear combination of them. The next conjugate direction given by
k 2
r
k+1 k k
p = r + sk p , where sk = 2
krk−1 k

results from the A-orthogonality requirement: pi , pk = 0 for i < k. The
updated solution approximation
k 2
r
k+1 k k+1
x = x + αk+1 p , where αk+1 = k+1 k+1
(p ,p )
is then specified so as to make the corresponding residual
rk+1 = b − Axk = rk − αk+1 pk+1
as small possible, by requiring that it be orthogonal to rk .

Therefore the Conjugate Gradient method can be formulated as fol-
lows.
Algorithm
1. Compute r0 = b−Ax0 , and set p1 = r0 . For k = 1, 2, ..., the kth iteration
is computed as follows:
2. For j = 0, 1, ...,until convergence Do
rk 2

3. αk+1 = k+1 k+1
(p ,p )
4. xk+1 = xk + αk+1 pk+1
5. rk+1 = rk − αk+1 pk+1
2
kr k k
6. sk = kr k−1 k2
7. pk+1 = rk + sk pk .
Observe that the algorithm does not require solving any linear systems:
apart from multiplication of a matrix times a vector to evacuate Apk , all other
operations are rapidly evaluated Euclidean dot products. Unlike Gaussian
elimination, the method produces a sequence of successive approximations

x1 ,x2 , ...to the solution x, and so the iteration can be stopped as soon as a
desired solution accuracy is reached, which can be assessed by comparing how
close the successive iterates are to each other. On the other hand, unlike purely
iterative methods, the conjugate gradient method does eventually terminate at
exact solution, because, there are at most n conjugate directions, forming
an orthogonal basis of Rn for the inner product induced by A. Therefore,
xn = α1 p1 +...+αn pn = x must be the solution since its residual rn = b−Axn
is orthogonal to all the conjugate basis vectors p1 , ..., pn and hence must be
zero.
Example 253. Solve the linear system Ax = b by the conjugate gradient
method with    
3 −1 0 1
A = −1 2 1 , b =  2 
0 1 1 −1
exact solution is x = (2, 5, −6)T .
Solution: Implement the conjugate gradient method, starting from the initial
guess x0 = (0, 0, 0)T . The corresponding residual vector is r0 = b − Ax0 =
b = (1, 2, −1)T . The first conugate direction is p1 = r0 = (1, 2, −1)T . Use the
formula to obtain the updated approximation to the solution
   3 
0 2
r 1 2
6
x1 = x0 + 1 1 p1 =  2  =  3  .
 
(r , r ) 4
−1 − 12
T
noting that p1 , p1 ) = p1 Ap1 = 4. In the next stage of the algorithm, we
1 5
1 1

compute the residual r = b − Ax = − 2 , −1, − 2 . The conjugate direction
is     
3
− 12

1 2
r 15 1 4
p2 = r 1 + 1
2x =
−1  + 2  2  =  3 
  
0 6 2 
kr k −1
− 52 − 15
4
T
which, as designed, satisfies the conjugacy condition p1 , p2 = p1 Ap2 = 0.
Each entry of the ensuing approximation
     
1 2 3 3 7
r 2 15 4 3
x2 = x1 + 2 2 p2 =  3  + 27 2  3 
 =  14
   
(p , p )
 2 3 
− 32 4 − 15 4 − 17
3
Since we are dealing with a 3 × 3 system, we will recover the exact solution
by one more iteration of the algorithm.The new residual is r2 = b − Ax2 =
(− 43 , 23 , 0)T . the final conjugate direction is

     
3
2 2
r − 43 20 4 − 10
9
p3 = r2 + 2 3 
 2  9   10 
1 2p = 3  + 15  2 = 9 
kr k
0 2 − 15
4 − 10
9
which can be, easily checked, is conjugate to both p1 and p2 . Thus, the solution
is obtained from
    
7 10
−

2 2
r 3 20 9 2
x3 = x2 + 3 3 p3 =  14 9 
3  + 15 
10 
9 =
 5 
 
(p , p )
− 17 2 − 10 −6
3 9
6.9.4 Preconditioned Conjugate Gradient Method

If the matrix A is ill conditioned, the conjugate gradient method is highly
sensible to rounding errors. So, although the exact answer should be obtained
in n steps, this is not always the case. The main benefit of the conjugate
gradient method is as an iterative method applied to a better conditioned
system.√In this case an acceptable approximate solution is often obtained in
about n steps.
To apply the conjugate gradient method to a better-conditioned system,
we select a non-singular conditioning matrix C so that
T
Â = C −1 A C −1
is better conditioned.
Âx̂ = b̂
where
x̂ = C T x and b̂ = C −1 b.
Then T
Âx̂ = C −1 A C −1 = C −1 Ax
T
which means, we can solve for Âx̂ = b̂ for x̂ and then obtain x by C −1 .
Since
x̂ = C T xk
we have
T
r̂k = b̂ − Âxk = C −1 b − C −1 A C −1 xk = C −1 (b − Axk ) = C −1 rk .
Let p̂k = C T pk and wk = C −1 rk . Then

C −1 rk , C −1 rk

r̂k , r̂k
ŝk = k−1 k−1 =
(r ,r ) (C −1 rk−1 , C −1 rk−1 )
so
wk , wk
ŝk = .
(wk−1 , wk−1 )
Thus
C −1 rk−1 , C −1 rk−1

r̂k−1 , r̂k−1 wk−1 , wk−1
α̂k = = =
(C T pk , C −1 Apk )

(p̂k , p̂k ) T
C T pk , C −1 A (C −1 ) C T pk
and
k wk−1 , wk−1
α̂ = .
(pk , pk )
Further
x̂k = x̂k−1 + α̂k p̂k
so
C T x̂k = C T x̂k−1 + α̂k C T p̂k
and
xk = xk−1 + α̂k pk .
Also,
r̂k = r̂k−1 − α̂k Âp̂k
so T
C −1 rk = C −1 rk−1 − α̂k C −1 A C −1 p̂k
that is
rk = rk−1 − α̂Apk .
Finally
p̂k+1 = r̂k + ŝk p̂k
and
C T pk+1 = C −1 rk + ŝk C T pk
yields
T
pk+1 = C −1 C −1 rk + ŝk pk
or T
pk+1 = C −1 wk + ŝk pk .
Example 254. The next example illustrates the effect of preconditioning on
a poorly conditioned matrix. The linear system Ax = b with
   
0.2 0.1 1 1 0 1
 0.1 4 −1 1 −1   2 
   
A=  1 −1 60 0 −2  and b =  3 
  
 1 1 0 8 4   4 
0 −1 −2 4 700 5
has the solution
x∗ = (7.859713071, 0.4229264082, − 0.07359223606, − 0.5406430164,
0.01062616286)T .
Solution: The matrix A is symmetric and positive definite
but is ill-
conditioned with condition number K∞ (A) = kAk∞ A−1 ∞ = 13961.71.
We will use tolerance 0.01 and compare the results obtained from the Jacobi,
Gauss-Seidel iterative methods and from the conjugate gradient method with
C −1 = I. Then we precondition by choosing C −1 as D−1/2 , the diagonal ma-
trix whose diagonal entries are the reciprocal of the positive square roots of
the diagonal entries of the positive definite matrix A. The results are presented
in Table...The Conjugate method gives the most accurate approximation with
the smallest number of iterations.

Method Number of Iterations x(k) − x∗ ∞
Jacobi 49 0.00305834
Gauss-Seidel 15 0.02445559
Conjugate gradient 5 0.00629785
Conjugate gradient 4
Preconditioned 0.00009312
The preconditioned conjugate gradient method is often used in the so-

lution of linear systems in which the matrix is sparse and positive definite.
These systems arise from the discretization by finite elements methods or finite
difference methods of boundary value problems. The larger the system, the
more impressive the conjugate gradient method becomes since it significantly
reduces the number of iterations required. In these systems, the precondi-
tioning matrix C is approximately equal to L in the Choleski factorization
LLt of A. Generally, small entries in A are ignored and Choleski’s method is
applied to obtain what is called an incomplete LLt factorization of A. Thus
T
C −1 C −1 ≈ A−1 and a good approximation is obtained.
6.9.5 Generalized Minimum Residual Method (GMRES)

Ax = b
Given an initial guess x0 to this system, and the Krylov subspace, this method
consists of minimizing the residual norm over all vectors in x0 + Km . Any
vector x in x0 + Km can be written as
x = x0 + V m y
where y is an vector in Rm . Let
J(y) = kb − Axk2 = kb − A(x0 + Vm y)k2
So
b − Ax = b − A(x0 + Vm y)
= r0 − AVm y
= βv1 − Vm+1 H̄m y

= Vm+1 βe1 − H̄m y .
Since the columns-vectors ofVm+1 are orthogonal, then

J(y) = kb − A(x0 + Vm y)k2 = βe1 − H̄m y 2 (6.53)
The GMRES approximation is the unique vector of x0 + Km which minimizes

the functional (6.53). If we denote by ym the vector that minimizes (6.53),
this approximation can be obtained as
xm = x0 + V m ym
Algorithm (GMRES)
1.Compute r0 = b − Ax0 , , β = kr0 k2 , and v1 = r0 /β
2.Define (m + 1) × n matrix H̄m = (hij ). Set H̄m = 0.

3. For j = 1, 2, ..., m Do:
4. Compute wj = Avj
5. For i = 1, ..., j Do:

6. hij = (wj , vi )
7. wj = wj − hij vj
8. EndDo
9. Compute hj+1 ,j = kwj k2 . If hj+1 ,j = 0 set m = j and GoTo 12
10. Compute vj+1 = wj /hj+1 ,j
11. EndDo
−1
12. Compute ym = Hm (βe1 ) and xm = x0 + Vm ym
6.10 Multi-grid Methods

The classical iterative methods fail to be effective whenever the spectral ra-
dius of the iteration matrix is close to one. A Fourier analysis shows that large
eigenvalues are associated with errors of low frequency. Therefore, the smooth
component (low frequencies) of the error holds back convergence, whereas high
frequency errors are rapidly damped.
The basic idea of the method is to change the grid, whenever the linear
system at hands arises from a discretization of differential equations over a
certain mesh. As a matter of fact, errors that are smooth on a grid of width
h can be attacked on a coarser grid, while high frequency errors that are not
visible on the coarse grid of width 2h can be solved on the fine grid.
6.10.1 Multi-grid Cycles

The basic multi-grid algorithm can be presented in an abstract fashion in
the following form. We are concerned with a family of linear problems
Aj x(j) = b(j) , j = 0, ..., m (6.54)

j
where Aj is an invertible linear operator on a finite dimensional space V , with
dim V j < dim V j+1 , j = 0, ..., m − 1, and b(j) is a given right hand side.The
goal is to solve (6.54) for j = m. For this reason we omit the subscript m for
the highest level,thus writing
V = V m , A = Am , b = b(m) , x = x(m)
For each j, problem (6.54) should be regarded as the discretization of a differ-
ential equation on a grid whose characteristic mesh spacing is hj . If we assume
that hj−1 = 2hj and set h = hm , then problem (6.54) provides a family of
discretizations with a refinement factor 2.
To solve the problem (6.54) we consider a basic iterative method
xk+1 k k
(j) = x(j) + Pj (b(j) − Aj x(j) ) (6.55)
where Pj is a preconditioner for Aj . The associated error ek(j) is transformed

according to
ek+1 k
(j) = Bj e(j) , Bj = I − Pj Aj .
The multi-grid algorithm can be regarded as an acceleration of (6.55) for

j = m, and can be formulated as follows.
1. Pre-smoothing on fine grid.
Given x0(j) , do n1 times
xk(j) = xk−1 k−1

(j) + Pj (b(j) − Aj x(j) ) , k = 1, ..., n1
2. Coarse grid correction

(i) Form the residual (or defect) r(j) = b(j) − Aj xn(j)
1
;
(ii) Restrict the residual on the coarse grid setting r(j−1) = Rjj−1 r(j) , where
Rjj−1 : V j → V j−1
is a restriction operator;
(iii) solve the the defect problem
Aj−1 x(j−1) = r(j−1)
(iv) correct the solution in V j by setting
x̄(j) = xn(j)
1
+ Πjj−1 x(j−1)
where
Πjj−1 : V j → V j−1
is a prolongation operator.
2. Post-smoothing on the fine grid
Set x̂0(j) = x̄(j) and do n2 times
x̂k(j) = x̂k−1 k−1

(j) + Pj (b(j) − Aj x̂(j) ) , k = 1, ..., n2
6.11 Exercises
Analysis of a Quadratic Function
6.1. Find the minimum value of the function f (x1 , x2 , x3 ) = x21 + 2x1 x2 +
3x22 + 2x2 x3 + x23 − 2x1 + 3x3 + 2. How do you know that your answer
is really the global minimum ?
6.2 For the following quadratic functions, determine if there is a minimum.

If so, find the minimizer and the minimum value for the function.
a. x21 − 2x1 x2 + 4x22 + x1 − 1
b. 3x21 + 3x1 x2 + 3x22 − 2x1 − 2x2 + 4
c. x21 + 5x1 x2 + 3x22 + 2x1 − x2
6.3 For each matrix A, vector b,and scalar c, write out the quadratic function
f (x) given by (6.9). Then either find the minimizer x∗ and minimum
value f (x∗ ), or explain why there is none.
1
4 −12 −2
a. A = ,b= ,c=3
−12 45 2

3 2 4
b. A = ,b= ,c=0
2 1 1
 
3 −1 1
4
c. A =  −1 2 −1  ,b= , c = −3
1
1 −1 3
   
−1
   2 
d. A = 

,
 b=
 −3  , c = 0

LP and Simplex Algorithm
6.4. Write each Linear Programming as a standard maximization problem

using matrix notation. Use the simplex algorithm to maximize the ob-
jective function f subject to the given constraints.

−x1 + x2 ≤ 3
(a) f = 4x1 + 3x2 , .
4x1 − x2 ≤ 1

 x1 − x2 ≤ 2
(b) f = 2x1 + 3x2 , 4x1 − x2 ≤ 11 .
x1 + x2 ≤ 9


4x1 − 2x2 + x3 ≤ 3
(c) f = 2x1 + 2x2 + x3 , .
x1 + x2 + x3 ≤ 6

 4x1 + x2 − x3 ≥ −3
(d) f = 2x1 + x2 + 2x3 , x1 − x2 + 2x3 ≤ 5 .
3x1 + 2x2 + x3 ≤ 10

Jacobi and Gauss-Seidel Methods
6.5. Find the first two iterations of the Jacobi method for the following linear
system, using x(0) = 0
a.
3x1 − x2 + x3 = 1
3x1 + 6x2 + 2x3 = 0
3x1 + 3x2 + 7x3 = 4
b.
10x1 − x2 =9
−x1 + 10x2 − 2x3 = 0
−2x2 + 10x3 = 4
c.
10x1 + 5x2 =6
5x1 + 10x2 − 4x3 = 25
−4x2 + 8x3 − x4 = − 11
− x3 + 5x4 = − 11
6.6 Repeat Exercise 1 using Gauss-Seidel method.
Conjugate Gradient Method
6.7. Solve the following linear systems Ax = b, using the conjugate gradient
method, keeping track of the residual vectors and solution approximation
as you iterate
a.
3 −1 2
A= , b=
−1 5 1
b.    
6 2 1 1
A= 2 3 −1  , b =  0 
1 −1 2 −2
c.    
6 −1 −1 5 1
 −1 7 1 −1   2 
A=
 −1
, b =  
1 3 −3   0 
5 −1 −3 6 −1
6.8. The 9 × 9 matrix

 
4 −1 0 −1 0 0 0 0 0
 −1 4 −1 0 −1 0 0 0 0 
 
 0 −1 4 0 0 −1 0 0 0 
 
 −1 0 0 4 −1 0 −1 0 0 
 
A=  0 −1 0 −1 4 −1 0 −1 0 

 0
 0 −1 0 −1 4 0 0 −1 

 0
 0 0 −1 0 0 4 −1 0 

 0 0 0 0 −1 0 −1 4 −1 
0 0 0 0 0 −1 0 −1 4
arises in the finite difference (and finite element) discretization of the

Poisson equation on a nine point square grid. Solve the linear system
Ax = b with tolerance 0.05, using
(a) Jacobi method
(b) Gauss-Seidel method
(c) Conjugate gradient method
(d) Use the preconditioned conjugate gradient method with C −1 =
D−1/2 and to
(e) Compare the results in (a), (b), (c) and (d).

1. Application of algorithmic optimization to industrial problems is covered
in detail in [11] and [12].
2. Consult [6] and [7] regarding application of variation inequality to su-
perconductivity.
3. Read Chapters 6 and 9 of [15] for advanced study of algorithmic opti-
mization and variational inequality.
4. Many concepts discussed in the chapter are presented effectively in ref-

erences [1] through [5] and [8] through [14].
Bibliography
[1] K. Atkinson, Elementary Numerical Analysis, New York, Wiley, 1985.
[2] K. Atkinson, H. Weimin, Theoretical Numerical Analysis, Third Edition,

Springer, 2010.
[3] O. Axelson, Iterative Solution Methods, Cambridge University Press,
1994.
[4] U. N. Bhat, An Introduction to Queueing Theory, Modeling and Analysis
in Applications, Springer, 2008.
[5] Z. Dostal, Optimal Quadratic Programming Algorithms, Springer, 2009.
[6] K. M. Furati, A. H. Siddiqi, Fast algorithm for the Bean critical model
for superconductivity, Numerical Functional Analysis and Optimization,
26(2):177-192, 2005.
[7] K. M. Furati, A. H. Siddiqi, Quasi-variational inequality modeling prob-
lems in superconductivity, Numerical Functional Analysis and Optimiza-
tion, 26(2): 193-204, 2005.
[8] P.T. Harker, J.-S. Pang, Finite-dimensional variational inequality and
nonlinear complementarity problems: a survey of theory, algorithms and
applications, Math. Program. 48:161-220 (1990).
[9] R. J. LeVeque, Finite Difference Methods for Ordinary and Partial Dif-
ferential Equations. SIAM, 2007.
[10] R. J. LeVeque, Finite Difference Methods for Ordinary and Partial Di

erential Equations. SIAM 2007.
[11] G. M. Lee, N. N. Tam, N. D. Yen, Quadratic Programming and Varia-
tional Inequalities. Springer, 2005.
[12] H. Neunzert and A. H. Siddiqi, Topics is Industrial Mathematics-Case
Studies and Related Mathematical Methods, Kluwer Academic Publishers
(Now Springer), 2000.
[13] H. Neunzert, D. P. Wolters (Eds) Currents in Industrial Mathematics
From Concepts to Research Education, Springer, 2015.
493
494 Bibliography
[14] P. Qi Pan, Linear Programming Computation, Springer, 2014.
[15] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS, Publishing
Company, 1996.
[16] A. H. Siddiqi, Functional Analysis with Applications, Springer, Nature,
2017.
[17] Stephen C. Billupsa, Katta G. Murtyb, Complementarity problems, Jour-
nal of Computational and Applied Mathematics 124: 303-318, 2000.
[18] M. Rosenlicht, Introduction to Analysis, Dover Mathematics series, 1968.
[19] M. Reed and B. Simon, Functional Analysis, (Methods of Modern Math-
ematics), Academic Press, 1980.
[20] M. R. Hestenes and E. L. Stiefel, Methods of Conjugate gradients for

solving linear systems, J. Research Nat. Bur. Standards Section B, 49:
409-432, 1952.
[21] M. R. Hestenes, Conjugate Direction Methods in Optimization, Applica-
tion of Mathematics book series (SMAP), Vol 12: 81-149, Springer-Verlag,
1980.
Chapter 7
Computational Numerical Methods
in Engineering
7.1 Introduction to Numerical Differentiation . . . . . . . . . . . . . . . . . . . . . . . 495

7.1.1 Introduction to Finite Difference Methods . . . . . . . . . . . . . . 495
7.1.2 Taylor’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
7.1.3 Finite Differences for Function f of Two Variables . . . . . 498
7.1.4 Application to Partial Differential Equations . . . . . . . . . . . 504
7.1.4.1 Poisson’s Problem with Dirichlet Boundary
Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
7.1.4.2 Finite Difference Methods for Parabolic
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
7.1.4.3 Finite Difference Methods for Hyperbolic
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
7.2 Finite Element in One Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
7.1 Introduction to Numerical Differentiation

In a case in which an analytical expression of the solution is difficult to
obtain or cannot be given, numerical solutions can be investigated. Many
numerical methods are available in the literature and, among them, the very
popular finite difference method. With this method, a particular differential
equation is replaced with a difference equation, that is, a system of linear
equations that can be solved by a computer.
The finite difference schemes which will be discussed in the next several
sections are developed by approximating derivatives.
7.1.1 Introduction to Finite Difference Methods

We shall begin with a real valued function f of a single variable x. Let f
be a sufficiently differentiable function. First let us recall the definition of the
495
first derivative:
f (x + h) − f (x)
f 0 (x) = lim . (7.1)
h→0 h
This means that
f (x + h) − f (x)
(7.2)
h
can be a good candidate for an approximation of f 0 (x) for sufficiently small h.
In order to determine how good this approximation is, we use the well-known
Taylor formula.
7.1.2 Taylor’s Formula

Let f have n + 1 continuous derivatives in the interval (x − l, x + l). Then
for any h such that x + h ∈ (x − l, x + l) there is a number 0 < θ < 1 such
that
0
f 0 (x) f 0 (x) 2 f 0(n) (x) n f 0(n+1) (x + θh) n+1
f (x+h) = f (x) + h+ h +· · ·+ h + h .
1! 2! n! (n + 1)!
(7.3)
If the derivative f 0(n+1) is bounded, the we can write
0
f 0 (x) f 0 (x) 2 f 0(n) (x) n
f (x + h) = f (x) + h+ h + ··· + h + O(hn+1 ). (7.4)
1! 2! n!
The O has the following meaning. If f (x) and g(x) are two functions such
that there exist constant M > 0 and δ > 0 with the property
|f (x)| ≤ M |g(x)| for |x| < δ (7.5)
then we write
|f (x)| = O(|g(x)|). (7.6)
If we take n = 1 in (7.3), we can obtain
f (x + h) = f (x) + f 0 (x)h + O(h2 ) (7.7)
which can be written as

f (x + h) − f (x)
f 0 (x) = + O(h). (7.8)
h
From the last expression we have one approximation of f 0 (x):
f (x + h) − f (x)
f 0 (x) ≈ , forward difference approximation.
h
Replacing h by −h in (7.7), we obtain
f (x) − f (x − h)
f 0 (x) = + O(h). (7.9)
h
Computational Numerical Methods in Engineering 497
From the last expression we have another approximation of f 0 (x):

f (x) − f (x − h)
f 0 (x) ≈ , backward difference approximation. (7.10)
h
Replacing h by −h in (7.4), we have
f 0 (x) f 0(n) (x) n

f (x − h) = f (x) − h + · · · + (−1)n h + O(hn+1 ). (7.11)
1! n!
If we take n = 1 in (7.4) and (7.11) and subtract (7.11) from (7.4), we get
f (x + h) − f (x − h)
f 0 (x) = + O(h2 ).
2h
Therefore, we have another approximation of the first derivative
f (x + h) − f (x − h)
f 0 (x) ≈ , central difference approximation. (7.12)
2h
The truncation errors for the above approximations are given by
f (x + h) − f (x)
Ef or (h) = − f 0 (x) = O(h) (7.13)
h
f (x) − f (x − h)
Eback (h) = − f 0 (x) = O(h)
h
f (x + h) − f (x − h)
Ecent (h) = − f 0 (x) = O(h2 )
h
Example 255. Let f (x) = ln x. Approximate f 0 (2) taking h = 0.1 and
h = 0.001 in (a) backward difference approximation, (b) backward difference
approximation and (c) central difference approximation.
1
Solution: We know that f 0 (x) = and the exact value of f 0 (2) is 0.5.
x
(a) For h = 0.1 we have
f (2 + 0.1) − f (2) ln(2.1) − ln(2)
f 0 (2) ≈ = ≈ 0.4879.
0.1 0.1
For h = 0.001
f (2 + 0.001) − f (2) ln(2.001) − ln(2)
f 0 (2) ≈ = ≈ 0.49988.
0.001 0.001
(b) For h = 0.1 we have
f (2) − f (2 − 0.1) ln(2) − ln(1.9)
f 0 (2) ≈ = ≈ 0.5129.
0.1 0.1
For h = 0.001
f (2) − f (2 − 0.001) ln(2) − ln(1.999)
f 0 (2) ≈ = ≈ 0.50018.
0.001 0.001
(c) For h = 0.1 we have

f (2 + 0.1) − f (2 − 0.1) ln(2.1) − ln(1.9)
f 0 (2) ≈ = ≈ 0.500417.
2(0.1) 2(0.1)
For h = 0.001 we have
f (2 + 0.1) − f (2 − 0.1) ln(2.001) − ln(1.999)
f 0 (2) ≈ = ≈ 0.5000.
2(0.001) 2(0.001)
If we use the Taylor series (7.4) and (7.11), then we obtain the following
approximations for the second derivative
f (x + 2h) − 2f (x + h) + f (x)
f 00 (x) ≈ , forward approximation. (7.14)
h2
f (x) − 2f (x − h) + f (x − 2h)
f 00 (x) ≈ , backward approximation.
h2
(7.15)
00 f (x + h) − 2f (x) + f (x − h)
f (x) ≈ , central approximation. (7.16)
h2
The approximations (7.14) and (7.15) for the second derivative are accurate
of first order, and the central difference approximation (7.16) for the second
derivative is accurate of second order.
Example 256. Approximate f 00 (2) for the function f (x) = ln x taking h =
0.1 with the central difference approximation.
1
Solution: We find f 00 (x) = − and so f 00 (2) = −0.25. With the central
x
difference approximation for the second derivative,
f (x + h) − 2f (x) + f (x − h) ln 2.1 − 2 ln 2 + ln 1.9
f 00 (x) ≈ ≈ ≈ −0.25031.
h2 (0.1)2
7.1.3 Finite Differences for Function f of Two Variables

The approximation of the derivative of a function of a single variable can
be extended to functions of several variables. For example, if f (x, y) is a given
function of two variables (x, y) ∈ Ω ⊂ R2 , for the first partial derivative we
have
f (x+h,y)−f (x,y)
fx (x, y) = h + O(h), forward difference. (7.17)
f (x,y)−f (x−h,y)
fx (x, y) = h + O(h), backward difference. (7.18)
f (x+h,y)−f (x−h,y)
fx (x, y) = h + O(h2 ), central difference. (7.19)
fy (x, y) = f (x,y+k)−fk
(x,y)
+ O(k), forward difference. (7.20)
f (x,y)−f (x,y−k)
fy (x, y) = k + O(k), forward difference. (7.21)
f (x,y+k)−f (x,y−k)
fy (x, y) = 2k + O(k 2 ), backward difference. (7.22)
For the partial derivative of second order we have

f (x + 2h, y) − 2f (x + h, y) + f (x, y)
fxx (x, y) = + O(h), central difference.
h
(7.23)
f (x, y) − 2f (x − h, y) + f (x − 2h, y)
fxx (x, y) = + O(h), forward difference.
h
(7.24)
f (x, y) − 2f (x − h, y) + f (x − 2h, y) 2
fxx (x, y) = +O(h ), backward difference.
h
(7.25)
f (x, y) − 2f (x − h, y) + f (x − 2h, y)
fxx (x, y) = + O(h2 ), central difference.
h
(7.26)
For the second partial derivatives with respect to y, as well as for the mixed
derivatives, we have similar expressions.
Application to ordinary differential equations
Euler’s method
The object of the method is to obtain an approximation to the initial value
problem (IVP):
dy
(
= f (t, y), t ≥ t0
dt . (7.27)
y(t0 ) = y0
Starting from the data given at the initial time t0 = 0, one should be able
to march forward in time, computing approximation at successive times t1 ,
t2 , . . . We will use k to denote the time step, so tn = nk for n ≥ 0. We will
use Taylor’s theorem to derive Euler’s method. Suppose that y(t), the unique
solution to (7.27), is twice differentiable, so that for k = 0, 1, 2, . . .
(tn+1 − tn )2 00
y(tn+1 ) = y(tn ) + (tn+1 − tn )y 0 (tn ) + y (ξn ) (7.28)
2
for some ξn in (tn , tn+1 ). Since k = tn+1 − tn , we have
k 2 00
y(tn+1 ) = y(tn ) + ky 0 (tn ) + y (ξn ) (7.29)
2
and since y(t) satisfies the differential equation(7.27),
k 2 00
y(tn+1 ) = y(tn ) + kf (tn , y(tn )) + y (ξn ). (7.30)
2
If we denote by wn the approximation of y(tn ), that is, wn ≈ y(tn ), for each
n = 0, 1, . . . ,. Now, by ignoring the remainder term, we obtain the Euler’s
method
yn+1 = yn + kf (tn , yn ) for each n = 0, 1, . . . (7.31)
with an error estimate
|y(tn ) − yn | = O(k). (7.32)
TABLE 7.1
tn yn y(tn ) |yn − y(tn )|

0.0 0.5000000 0.5000000 0.0000000
0.2 0.8000000 0.8292986 0.0292986
0.4 1.1520000 1.2140877 0.0620877
0.6 1.5520000 1.6489406 0.0985406
0.8 1.9884800 2.1272295 0.1387495
1.0 24.8657845 2.6408591 0.18268311
1.2 2.9498112 3.1799415 0.2301303
1.4 3.4517734 3.7324000 0.2801303
1.6 3.9501281 4.2834838 0.3333557
1.8 4.4281538 4.8151763 0.3870225
2.0 4.8657845 5.3054720 0.4396874
Example 257. Use Euler’s method to approximate the solution to the initial
value problem
dy
(
= y − t2 + 1, 0 ≤ t ≤ 2,
dt (7.33)
y(0) = 0.5
with k = 0.2. The exact solution is y(t) = (t + 1)2 − 0.5et . Table 7.1 shows
comparison between the approximate values at tn and the exact values.
Note that the error grows slightly as the value of t increases. This controlled
error growth is a consequence of the stability of Euler’s method, which implies
that the error is expected to grow no worse than it would in a linear manner.
Runge-Kutta Methods
Second order Runge Kutta method
The second order Runge-Kutta method is also known as the improved
Euler’s method. Starting from the initial point (t0 , y0 ), we compute two
slopes:
sn1 = f (t0 , y0 ) (7.34)

sn2 = f (t0 + k, y0 + ksn1 ).
With these slopes, we define the next values of the dependent variables to be
sn1 + sn2
yn = y0 + k . (7.35)
2
An analysis using Taylor’s expansion reveals an improvement in the estimate
for the truncation error. For the second order Runge Kutta method, we have
|y(tn ) − yn | = O(k 2 ). (7.36)

TABLE 7.2
tn yn y(tn ) |yn − y(tn )|

0.2 0.8292933 0.8292986 0.0000053
0.4 1.2140762 1.2140877 0.0000114
0.6 1.6489220 1.6489406 0.0000186
0.8 1.1272027 2.1272295 0.0000269
1.0 2.6408227 2.6408591 0.0000364
1.2 3.1798942 3.1799415 0.0000474
1.4 3.7323401 3.7324000 0.0000599
1.6 4.2834095 4.2834838 0.0000743
1.8 4.8150857 4.8151763 0.0000906
2.0 5.3053630 5.3054720 0.0001089
Fourth order Runge-Kutta method

The next method is also due to Runge and Kutta. It is probably the most
commonly used solution algorithm. For most equations and systems it is suit-
ably fast and accurate. For this method we use four slopes. Starting with the
initial point (t0 , y0 ), we compute
sn1 = f (t0 , y0 ) (7.37)

k k
sn2 = f (tk−1 + , yk−1 + sn1 )
2 2
k k
sn3 = f (tk−1 + , yk−1 + sn2 )
2 2
sn4 = f (tk−1 + k, yk−1 + ksn3 ).
With these slopes we define the next value of the dependent variable:
sn1 + 2sn2 + 2sn3 + sn4
yn = y n−1 +k . (7.38)
6
For the fourth order Runge-Kutta method, we have
|y(tn ) − yn | = O(k 4 ). (7.39)
Example 258. Use the Runge-Kutta method of order four to obtain approx-
imation to the solution of the initial value problem
dy
(
= y − t2 + 1, 0 ≤ t ≤ 2,
dt (7.40)
y(0) = 0.5
with k = 0.2. The exact solution is y(t) = (t + 1)2 − 0.5et . Table 7.2 shows
comparison between the approximate values at tn and the exact values.
Multistep methods
In previous sections we have discussed numerical procedures for approximating
the solutions of the initial value problem in which the data at the point t = tn
are used to compute an approximate value of the solution y(tn+1 ) at the next
point t = tn+1 . In other words, the computed value of y at any mesh point
depends only on the data at the preceding mesh point.
These are called one-step methods. However, once approximate values
of the solution y have been obtained at a few points beyond t0 , it is natural to
ask whether we can make use of some of this information, rather than just the
value at the last point, to compute the approximate value at the next point.
More precisely, if y1 at t1 , y2 at t2 ,. . . , yn at tn are known, how can we use
this information to determine yn+1 at tn+1 ? Methods that use information
at more than the last mesh point are referred to as multistep methods. In
this section we describe two types of multistep methods: Adams methods and
backward differentiation formulas.
Adams methods
Recall that Z tn+1
y(tn+1 ) − y(tn ) = y 0 (t)dt (7.41)
tn
where y(t) is the solution of the initial value problem (7.27). The basic idea
of an Adams method is to approximate y 0 (t) by a polynomial Pk (t) of degree
k and to use the polynomial to evaluate the integral on the right side of
(7.41). The coefficients in Pk (t) are determined by using k + 1 previously
calculated data points. For example, suppose that we wish to use a first degree
polynomial P1 (t) = At + B. Then we need only the two data points (tn , yn )
and (tn−1 , yn−1 ). Since P1 is to be an approximation to y 0 , we require that
P1 (tn ) = f (tn , yn ) and that P1 (tn−1 ) = f (tn−1 , yn−1 ). Now, denoting f (tn , yn )
by fn , A and B must satisfy the equations
Atn + B = fn (7.42)
Atn−1 + B = fn−1 .
Solving for A and B, we obtain

fn − fn−1 fn−1 tn − fn tn−1
A= , B= . (7.43)
k k
Replacing y 0 (t) by P1 (t) and evaluating the integral in (7.41), we find that
A 2
y(tn+1 ) − y(tn ) = (t − t2n ) + B(tn+1 − tn ). (7.44)
2 n+1
Finally, we replace y(tn+1 ) and y(tn ) by yn+1 and yn , respectively, and carry
out algebraic simplification to get
3 1
yn+1 = yn + kfn − kfn−1 . (7.45)
2 2
Equation (7.45) is the second order Adams-Bashforth formula. It is an

explicit formula for yn+1 in terms of yn and yn−1 and has a local truncation
error proportional to k 3 .
Note that the first order Adams-Bashforth formula, based on the polyno-
mial P0 (t) = fn of degree zero, is just the original Euler formula.
Adams-Bashforth method
More accurate Adams formula can be obtained by following the procedure
outlined above, but using a higher degree polynomial and correspondingly
more data points. For example, suppose that a polynomial P3 (t) of degree
three is used. The coefficients are determined from the four points (tn , yn ) ,
(tn−1 , yn−1 ), (tn−2 , yn−2 ), and (tn−3 , yn−3 ). Substituting this polynomial for
y 0 (t) in Equation (7.41), evaluating the integral, and simplifying the result,
we eventually obtain the fourth order Adams-Bashforth formula
yn+1 = yn + (k/24) (55fn − 59fn−1 + 37fn−2 − 9fn−3 ) . (7.46)
Adams-Moulton method
A variation of the derivation of the Adams-Bashforth formula gives another
set of formulas called the Adams-Moulton formulas. We use a first degree
polynomial φ1 (t) = αt + β, but we determine the coefficients by using the
points (tn , yn ) and (tn+1 , yn+1 ). Thus α and β must satisfy
αtn + β = fn (7.47)
αtn+1 + β = fn+1
and it follows that
fn+1 − fn fn tn+1 − fn+1 tn
α= , β= . (7.48)
k k
Substituting φ1 (t) for y 0 (t) in equation (7.41) and simplifying, we obtain
1 1
yn+1 = yn + kfn + kf (tn+1 , yn+1 ) (7.49)
2 2
which is the second degree Adams-Moulton formula. The presence of
f (tn+1 , yn+1 ) in the right hand side shows that the Adams-Moulton formula
is implicit, rather than explicit, since the unknown yn+1 appears on both sides
of the equation.
More accurate higher order formulas can be obtained by using an approxi-
mating polynomial of higher degree. The fourth order Adams-Moulton formula
is
yn+1 = yn + (k/24) (9fn+1 + 19fn − 5fn−1 + fn−2 ) (7.50)
Observe that this is also an implicit formula because yn+1 appears in fn+1 .
Example 259. Use the Adams-Moulton-method with k = 0.2 to obtain an
approximation to y(0.8) for the solution of the initial value problem
y 0 = x + y − 1, y(0) = 1.
Solution: With a step of k = 0.2, y(0.8) will be approximated by y4 . To start

computation, we use the Runge-Kutta fourth method with t0 = 0, y0 = 1,
and k = 0.2 to obtain
y1 = 1.02140000, y2 = 1.09181796, y3 = 1.22210646.
Now with identifications t0 = 0, t1 = 0.2, t2 = 0.4, t3 = 0.6, and f (x, y) =

x + y − 1, we find
f0 = f (t0 , y0 ) = 0 + 1 − 1 = 0
f1 = f (t1 , y1 ) = 0.2 + 1.02140000 − 1 = 0.22140000
f2 = f (t2 , y2 ) = 0.4 + 1.09181796 − 1 = 0.49181796
f3 = f (t3 , y3 ) = 0.6 + 1.22210646 − 1 = 0.82210646.
With these values, the Adams-Bashforth formula gives

0.2
ȳ4 = y3 + (55f3 − 59f2 + 37f1 − 9f0 ).
24
And, as
f4 = f (t4 , ȳ4 ) = 0.8 + 1.42553535975 − 1 = 1.22535975,
the Adams-Moulton method yields

0.2
y4 = y3 + (9f4 + 19f3 − 5f2 + f1 ) = 1.42552788.
24
The exact value of y(0.8) is y(0.8) = 1.42554093.
7.1.4 Application to Partial Differential Equations

7.1.4.1 Poisson’s Problem with Dirichlet Boundary Conditions
To illustrate how difference methods can be used to solve elliptic PDEs,

we consider Poisson’s equation
−(uxx + uyy ) = f (x, y) ∀(x, y) ∈ Ω = (0, 1)2 ,

(7.51)
u/∂Ω = g(x, y) ∀(x, y) ∈ ∂Ω
where ∂Ω denotes the boundary of Ω. We assume that Ω is covered by a
uniform grid
1
xi = ih, yj = jh, 0 ≤ i, j ≤ J, h = . (7.52)
J
The approximate solution at point (xi , yj ) is denoted by uij . Using the second
order central difference
ui+1j − 2uij + ui−1j
δx uij = (7.53)
h2
and
uij+1 − 2uij + uij−1
δy uij = (7.54)
h2
to approximate the derivatives uxx and uyy in (. . . ), respectively, we obtain
the five point difference scheme
ui+1j − 2uij + ui−1j uij+1 − 2uij + uij−1
+ = −fij , 1 ≤ i, j ≤ J − 1 (7.55)
h2 h2
or
ui+1j + ui−1j + uij+1 + uij−1 − 4uij = −h2 fij . (7.56)
The boundary condition (7.50) is replaced by
ui0 = g(xi , 0) = gi0 , 0 ≤ i ≤ J (7.57)
and the right hand side by
fij = f (xi , yj ), 1 ≤ i, j ≤ J − 1. (7.58)
The local truncation error
1
Tij = [u(xi+1 , yj ) + u(xi−1 , yj ) + u(xi , yj+1 ) + u(xi , yj−1 )
h2
−4u(xi , yj )] + f (xi , yj ). (7.59)
Assuming that the derivatives of u are continuous up to order 4 in both x
and y, we find, by Taylor expansion, that
h2 ∂ 4 u ∂ 4 u

Tij = (uxx + uyy )(xi , yj ) + + 4 (xi , yj ) + f (xi , yj ) (7.60)
12 ∂x4 ∂y
from which we obtain
h2
4 4
∂ u ∂ u
|Tij | ≤ max 2 4 + 4 (7.61)
12 (x,y)∈(0,1) ∂x ∂y
that is, the truncation error is O(h2 ).
Denote the (J − 1)2 dimensional vector
T
U = [u1,1 , . . . , uJ−1,1 ; u1,2 , . . . , uJ−1,2 ; . . . ; u1,J−1 , . . . , uJ−1,J−1 ] (7.62)
where T means the transpose. With this notation, the scheme (7.56) leads to
a linear system
AU = F (7.63)
where A is a matrix of order (J − 1)2 given by
 
B −I
−I B −I 
 

 . . . 


 . . . 
 (7.64)

 . . −. 

 −I B −I 
−I B
with I being the identity matrix of order J − 1, and B being a matrix of order
J − 1 given by
 
4 −1
−1 4 −1 
 

 . . . 

A=  . . . .


 . . −. 

 −1 4 −1
−1 4
2
Furthermore, F is a (J − 1) dimensional vector whose elements are
F1,1 = h2 f1,1 + g1,0 + g0,1 , (7.65)

2
Fi,1 = h fi,1 + gi,0 , 2 ≤ i ≤ J − 2,
FJ−1,1 = h2 fJ−1,1 + gJ−1,0 + gJ,1 ,
F1,2 = h2 f1,2 + g0,2 ,
Fi,2 = h2 fi,2 , 2 ≤ i ≤ J − 2,
FJ−1,2 = h2 fJ−1,2 + gJ,2 ,
.
.
.
F1,J−1 = h2 f1,J−1 + g1,J + g0,J−1
Fi,J−1 = h2 fi,J−1 + gi,J , 2 ≤ i ≤ J − 2
FJ−1,J−1 = h2 fJ−1,J−1 + gJ−1,J + gJ,J−1 .
7.1.4.2 Finite Difference Methods for Parabolic Problems

To illustrate how difference methods can be used to solve parabolic PDEs,
we consider the heat equation model: find u(x, t) such that

 ut = c2 uxx , ∀(t, x) ∈ (0, T ) × (0, l)
u(0, t) = u(1, t) = 0, ∀t ∈ (0, T ) (7.66)
u(x, 0) = f (x), ∀x ∈ [0, l ]

where T denotes the terminal time for the model.

To solve the problem (7.66), we first divide the physical domain (0, T ) ×
(0, l) by N × J uniform grid points
T
tn = n4t, 4t = , n = 0, 1, . . . , N. (7.67)
N
l
xj = j4x, 4x = , j = 0, 1, . . . , J. (7.68)
j
We denote the approximate solution unj ≈ u(xj , tn ) at an arbitrary point

(xj , tn ). To obtain a finite difference scheme, we need to approximate the
derivatives in (7.66) by some finite differences. Below we give two simple finite
difference schemes for solving (7.66).
Explicit scheme
Substituting
(un+1
j − unj )
ut (xj , tn ) ≈ (7.69)
4t
and
(unj+1 − 2unj + unj−1 )
uxx (xj , tn ) ≈ (7.70)
h2
into (7.66), we obtain the explicit scheme for
un+1
j − unj (unj+1 − 2unj + unj−1 )
= (7.71)
4t h2
un+1
j = unj + µ(unj+1 − 2unj + unj−1 ), 1 ≤ j ≤ J − 1, 0 ≤ n ≤ N − 1, (7.72)
or
un+1
j = µunj−1 + (1 − 2µ) unj + unj+1 ), 1 ≤ j ≤ J − 1, 0 ≤ n ≤ N − 1, (7.73)
where we denote
4t
µ=
. (7.74)
h2
The boundary condition can be approximated directly as
un0 = unJ = 0, 0 ≤ n ≤ N − 1, (7.75)
and the initial condition can be approximated as
u0j = f (jh), 0 ≤ j ≤ J. (7.76)
Note that with the scheme (7.73), the approximate solution un+1
j at any in-
terior point can be obtained by a simple marching in time.
Example 260. Consider the problem

 ut = uxx , 0 < x < 1, 0 < t < 0.5
u(0, t) = u(1, t) = 0, 0 < t < 0.5
u(x, 0) = sin πx, 0≤x≤1

with
0 ≤ x ≤ 12

10x ,
f (x) = 1 .
10(1 − x), 2 <x<1
TABLE 7.3
Time 0.20 0.40 0.60 0.80

0.00 0.5878 0.9511 0.9511 0.5878
0.10 0.2154 0.3486 0.3486 0.2154
0.20 0.0790 0.1278 0.1278 0.0790
0.30 0.0289 0.0468 0.0468 0.0289
0.40 0.0106 0.0172 0.0172 0.0106
0.50 0.0039 0.0063 0.0063 0.0039
TABLE 7.4
Exact Approximation
u(0.4, 4.005) = 0.5806 u52 = 0.5758
u(0.6,0.06) = 0.5261 u63 = 0.5208
u(0.2, 0.010) = 0.2191 u10
1 = 0.2154
u(0.8, 0.14) = 0.1476 u14
4 = 0.1442
Using the explicit finite difference approximation, find a numerical solution

of the problem at different locations and several time instants, for h = 0.2,
4t = 0.01, and compare the solution with with the exact solution
2
u(x, t) = e−π t sin πx
Since we have homogeneous boundary conditions, the system can be written

in matrix form
un+1 = An u0
where T
un+1 = un+1
1 , . . . , un+1
J−1
T
u0 = (f (h), . . . , f (h))
and the (J − 1) × (J − 1) matrix A is given by
 
1 − 2µ µ 0 0 0

 µ 1 − 2µ µ 


 0 µ . . 0 

 0 0 µ 1 − 2µ µ 
0 0 ...0 µ 1 − 2µ
For h = 0.1, 4t = 0.4, µ = 0.4 < 0.5. Solving the system at t = 1.2, t = 4.8,
t = 14.4, we obtain approximations that are presented in Table 7.3.
We compare in Table 7.4 a sample of exact values with their corresponding
approximations.
Implicit scheme
Similarly, by substituting
(unj − un−1
j )
ut (xj , tn ) ≈ (7.77)
4t
and
(unj+1 − 2unj + unj−1 )
uxx (xj , tn ) ≈ 2 . (7.78)
(4x)
into (7.66), another difference scheme can be constructed as
(unj − un−1
j ) (unj+1 − 2unj + unj−1 )
= 2 . (7.79)
4t (4x)
In which case, we obtain an implicit scheme
−µunj−1 + (1 + 2µ)unj − µunj+1 = un−1

j , 1≤j ≤J −1 (7.80)
Note that we have to solve a linear system at each time step to obtain the
approximate solution unj at all interior points. That is why the scheme (7.80)
is called implicit in order to distinguish it from the explicit scheme (7.73).
Crank-Nicolson scheme
A more rewarding scheme can be derived by averaging the explicit scheme
(7.73) at the point (xj , tn ) and the implicit scheme (7.80) at the point
(xj , tn+1 ) to obtain
" #
(un+1
j − unj ) 1 unj+1 − 2unj + unj−1 un+1 n+1
j+1 − 2uj + un+1
j−1
= + (7.81)
4t 2 h2 h2
or
µun+1 n+1
j+1 − (2 + µ)uj + un+1 n n n
j−1 = µuj+1 − (2 + µ)uj + uj−1 . (7.82)
The advantage of the Crank-Nicolson scheme (Table 7.5) over the explicit
one is that it doesn’t depend on the parameter µ.
Example 261. Use the Crank-Nicolson method to approximate the solution
of the problem

 ut = 14 uxx , 0 < x < 2, 0 < t < 0.3
u(0, t) = u(2, t) = 0, 0 < t < 0.3
u(x, 0) = sin πx, 0 ≤ x ≤ 2

with h = 0.25, 4t = 0.01, and c = 0.25. We obtain approximations that are

presented in the Table 7.6.
Time x = 0.25 x = 0.50 x = 0.75 x = 1.00 x = 1.25 x = 1.50
x = 1.75.
TABLE 7.5
t \ x 0.25 0.50 0.75 1.00 1.25 1.50 1.75

0.00 0.701 1.000 0.7071 0.0000 −0.7071 −1.0000 −0.7071
0.05 0.6289 0.8894 0.6289 0.0000 −0.6289 −0.8894 −0.6289
0.10 0.5594 0.7911 0.5594 0.0000 −0.5594 −0.7911 −0.5594
0.15 0.4975 0.7036 0.4975 0.0000 −0.4975 −0.7036 −0.4975
0.20 0.4425 0.6258 0.4425 0.0000 −0.4425 −0.6258 −0.4425
0.25 0.3936 0.5567 0.3936 0.0000 −0.3936 −0.5567 −0.3936
0.3 0.3501 0.4951 0.3501 0.000 −0.3501 −0.4951 −0.3501
TABLE 7.6
Exact Approximation
u(0.75, 0.005) = 0.6250 u53 = 0.6289
u(0.50, 0.20) = 0.5261 u20
2 = 0.5208
u(0.25, 0.10) = 0.5525 u10
1 = 0.5594
7.1.4.3 Finite Difference Methods for Hyperbolic Problems

Explicit scheme
We consider the wave equation model: find u(x, t) such that
utt = uxx , ∀(t, x) ∈ (0, T ) × (0, l) (7.83)
subject to the boundary conditions
u(0, t) = u(1, t) = 0, ∀t ∈ (0, T ) (7.84)
and the initial conditions
u(x, 0) = f (x), and ut (x, 0) = g(x) ∀x ∈ [0, l. ] (7.85)
Let unj ≈ u(xj , tn ) be the approximation to the exact solution u(x, t) at the
point (xj , tn ). Now, using the central differences for both time and space
partial derivatives in the wave equation, we obtain
un+1
j − 2unj + un−1
j unj+1 − 2unj + unj−1
2 = (7.86)
(4t) h2
or
un+1
j = −un−1
j + 2(1 − δ 2 )unj + δ 2 (unj+1 + unj−1 ), 1 ≤ j ≤ J − 1 (7.87)
where
4t
δ= . (7.88)
h
To start the approximation, we use Taylor’s formula for u(x, t) with respect
to t to obtain
2
(4t)
u(xj , t1 ) ≈ u(xj , 0) + ut (xj , 0)4t + utt (xj , 0) . (7.89)
2
From the wave equation and the initial condition u(x, 0) = f (x) we have
utt (xj , 0) = uxx (xj , 0) = f 00 (xj ). (7.90)
Using the other initial condition ut (xj , 0) = g(x), Equation (7.89) becomes
2
(4t)
u(xj , t1 ) ≈ u(xj , 0) + g(xj )4t + f 00 (xj ) . (7.91)
2
Now if we use the central finite difference approximation for f 00 (xj ), then
(7.91) takes the form
2
(4t)
u(xj , t1 ) ≈ u(xj , 0) + g(xj )4t + (f (xj+1 ) − 2f (xj ) + f (xj−1 ) (7.92)
2h2
and since f (xj ) = u(xj , 0) = uj,0 we have
δ2
u(xj , t1 ) ≈ 1 − δ 2 uj,0 + (uj−1,0 + uj+1,0 ) + g(xj )4t.

(7.93)
2
The last approximation allows us to have the required first step
δ2
u1j = 1 − δ 2 u0j + (u0j−1 + u0j+1 ) + g(xj )4t.

(7.94)
2
The boundary condition (7.84) can be approximated directly as
un0 = unJ = 0, n = 1, 2, 3, . . . (7.95)
and the initial condition (7.85) can be approximated as
u0j = f (jh), 1 ≤ j ≤ J − 1. (7.96)
Writing this set of equations in matrix form gives
un+1 = Aun (7.97)
where
2(1 − 2δ 2 ) δ2
 
0 0 0

 δ2 2(1 − 2δ 2 ) δ2 


 0 δ2 . . 0 .
 (7.98)
 0 0 δ2 2(1 − 2δ 2 ) δ2 
0 0 ...0 δ2 2(1 − 2δ 2 )
The technique defined by (7.87) and (7.94) is known as the explicit finite
difference approximation of the wave equation.
Example 262. Consider the wave problem


 utt = 4uxx , 0 < x < 1, t > 0
u(0, t) = u(1, t) = 0, t > 0
u(x, 0) = sin πx, 0 ≤ x ≤ 1 and ut (x, 0) = 0, 0 ≤ x ≤ 1.

It is easily verified that the exact solution to this problem is
u(x, t) = sin πx cos 2πt.
The finite difference scheme applied to this problem with h = 0.1, 4t =

0.05, δ = 1, J = 10, gives the values.
———————
xi 0.0000
7.2 Finite Element in One Dimension

To have some idea of the finite element method, we consider an example
on solving one-dimensional boundary value problems. Such examples show
various aspects of the finite element method in the simple context of one-
dimensional problems.
Let us consider a finite element method to solve the boundary value prob-
lem. Find u(x) such that
−u00 + αu = f

(7.99)
u(0) = 0, u(1) = 0
where f ∈ L2 (0, 1) is given. Let
V = H01 (0, 1) = v ∈ H 1 (0, 1) such that v(0) = v(1) = 0

(7.100)
where

dv
H 1 (0, 1) = v ∈ L2 (0, 1) such that ∈ L2 (0, 1) (7.101)
dx
and L2 (0, 1) is the space of square integrable functions on (0, 1).

The weak formulation of the problem is
Z 1 Z 1
0 0
u∈V, (u v + uv)dx = f vdx ∀v ∈ V. (7.102)
0 0
Let us develop a finite element method for the problem. For a natural
number N , we partition the set Ω̄ = [0, 1] into N parts:
Ω̄ = ∪N
i=1 Ki (7.103)
where Ki = [xi−1 , xi ], 1 ≤ i ≤ N , are called the elements, and the xi , 1 ≤

i ≤ N, are called the nodes, 0 = x0 < x1 < · · · < xN = 1. In this example
we have Dirichlet conditions at x0 and xN . Denote hi = xi − xi−1 and h =
max1≤i≤N hi . The value h is called the mesh size or mesh parameter. We use
continuous piecewise linear functions for the approximation, i.e., we choose

Vh = vh ∈ C(Ω̄) such that vh /Ki ∈ P1 (Ki ), 1 ≤ i ≤ N, vh (0) = vh (1) = 0 .
For the basis functions of the space Vh , we introduce the ”hat” functions
associated with the nodes x1 , . . . , xN −1 . For i = 1, . . . , N − 1, let

 [c]l(x − xi−1 )/hi , xi−1 ≤ x ≤ xi
φi (x) = (xi+1 − x)/hi+1 , xi ≤ x ≤ xi+1 (7.104)
0, otherwise.

These functions are continuous and piecewise linear. The first order weak
derivatives of the basis functions are piecewise constants:

 [c]l1/hi , xi−1 ≤ x ≤ xi
φ0i (x) = −1/hi+1 , xi ≤ x ≤ xi+1 (7.105)
0, otherwise.

The corresponding finite element problem is

Z 1 Z 1
uh ∈ Vh , (u0h vh0 + uh vh )dx = f vh dx ∀vh ∈ Vh . (7.106)
0 0
Write
N
X −1
uh = uj φj .
j=1
Note that uj = uh (xj ), 1 ≤ j ≤ N − 1. We see that the finite element

problem (7.106) is equivalent to the following linear system for the unknowns
u1 , . . . , uN −1 :
N
X −1 Z 1 Z 1
uj (φ0i φ0j + αφi φj )dx = f φi dx, 1 ≤ i ≤ N − 1 (7.107)
j=1 0 0
Let us find the coefficients matrix of the system (7.105) in the case of a uniform
partition, h1 = · · · = hN , The following formulas are useful for this purpose

Z 1
1
φ0i φ0i−1 dx = − , 2 ≤ i ≤ N
0 h
Z 1
2 2
(φ0i ) dx = , 1 ≤ i ≤ N − 1
0 h
Z 1
h
φi φi−1 dx = , 2 ≤ i ≤ N
0 6
Z 1
2 2h
(φi ) dx = , 1 ≤ i ≤ N − 1.
0 3
We see that in matrix and vector notation, in the case of a uniform partition,
the finite element system (7.107) can be written as
Au = b (7.108)
where
u = (u1 , . . . , uN −1 )T (7.109)
2h 2 h 1
is the unknown vector, a = + , b=c= − ,
3 h 6 h
 
a b
 c a b . 
 

 c . . 

A=
 . . . 
 (7.110)

 . a b 

 c a b 
c a
is the stiffness matrix, and

Z 1 Z 1 T
b= f φ1 dx, . . . , f φN −1 dx (7.111)
0 0
is the load vector. The matrix A is sparse, thanks to the small supports of
the basis functions. One distinguished feature of the finite element method is
that the basis functions are constructed in such a way that their supports are
as small as possible, so that the corresponding stiffness matrix is as sparse as
possible.
Spectral method
The spectral element method represents a special case of Galerkin methods
in which the finite dimensional space of test functions is made of continuous
piecewise algebraic polynomials of high degree on each element of a partition
of the computational domain. For ease of exposition, we will focus only on the
one-dimensional problem: Find u(x) such that:
−u00 + u = f

(7.112)
u(0) = u(1) = 0.
Weighted Galerkin formulation

Let ω(x) be a weight function (positive and integrable on (0, 1). We define the
weighted inner product in L2 (0, 1) by
Z 1
(u, v)ω = u(x)v(x)ω(x)dx. (7.113)
0
We consider the approximation of (7.112) by using Galerkin method in the

polynomial space
XN = {φ ∈ PN : φ(0) = φ(1) = 0} . (7.114)
The Galerkin spectral formulation is

00
uN , vN + (uN , vN )ω = (fN , vN )ω ∀v ∈ XN (7.115)
ω
where fN is an appropriate polynomial approximation of f , which is usually

taken to be the interpolation of f associated with the Gauss-type quadrature
points. We shall see below that by choosing appropriate basis functions of
XN , we can reduce to a linear system with a sparse matrix that can be solved
efficiently.
N −2
Given a set of basis functions {φj }j=1 of XN , we denote
Z 1
fk = fN φk dx, f = (f1 , . . . , fN −1 )T
0
,
N
X −2
uN = ūj φj , u = (ū0 ,ū1 , . . . , ūN −2 )T
j=0
, Z 1 Z 1
skj = − φj φk , mkj = φj φk dx
0 0
and
S = (skj )0≤k,j≤N −2 , M = (mkj )0≤k,j≤N −2 .
Taking vN = φk , 0 ≤ k ≤ N −2 in (7.115), we can see that (7.115) is equivalent
to the following linear system
(S + αM )u = f . (7.116)
Next we determine the entries of the matrices S and M for two special
−1/2
cases: ω(x) ≡ 1 and ω(x) ≡ 1 − x2 .
We set ω(x) ≡ 1 and fN = IN f , the Legendre interpolation polynomial of
f with respect to the Legendre-Gauss-Lobatto points. Then problem (7.115)
becomes
Z 1 Z 1 Z 1
00
− uN vN dx + uN vN dx = IN f vN dx ∀vN ∈ XN (7.117)
0 0 0
which is called the Legendre-Galerkin method.
The linear system for (7.117) depends on the choice of the basis functions of
XN , which can be constructed with Legendre polynomials as
φk (x) = Lk (x) + αk Lk+1 (x) + βk Lk+2 (x), k ≥ 0
where the constants αk and βk are uniquely determined by the boundary
conditions of the continuous problem.
7.3 Exercises
Use the Runge-Kutta of order 4 (RK4) method with k = 0.1 to compute
a four-decimal approximation to the following initial value problems:
7.1. y 0 = 2x − 3y + 1, y(1) = 5.
7.2. y 0 = 4x − 2y, y(0) = 2.
7.3. y 0 = 1 + y 2 , y(0) = 0.
7.4. y 0 = x2 + y 2 , y(0) = 1.
7.5. y 0 = e−y , y(0) = 0.
7.6. Use RK4 method with k = 0.1 to approximate y(0.5) where y(x) is
2
the solution of the initial value problem y 0 = (x + y − 1) , y(0) = 2.
Compare this approximate value with the actual value.
7.7. If the air resistance is proportional to the square of the instantaneous

velocity, then the velocity ν of a mass m dropped from a given height h
is determined from
dv
m = mg − ρν 2 , k > 0
dt
ν(0) = 0.
Let ρ = 0.125, m = 5 slags, and g = 32 f t/s2 .
(a) Use the RK4 method with k = 1 to approximate the velocity ν(5).
(b) Use a numerical solver to graph the solution of the IVP on the
interval [0, 6].
(c) Use separation of variables to solve the IVP and then find the actual
value ν(5).
Use the Adams-Moulton method to approximate y(1.0), where y(x) is the

solution of the following initial value problem
7.8. y 0 = 1 + y 2 , y(0) = 0.
7.9. y 0 = y + cos x, y(0) = 1.

2
7.10. y 0 = (x − y) , y(0) = 0.
√
7.11. y 0 = xy + y, y 0 = 1 + y 2 , y(0) = 1.
Laplace equation exercises
7.12. Use the numerical scheme (7.53) to solve the following boundary value
problems:
(a)
−(uxx + uyy ) = 0, 0 < x < 2, 0 < y < 1

u(0, y) = 0, u(2, y) = 0, 0 < y < 1
u(x, 0) = 100, u(x, 1) = 0, 0 < x < 2
1
with mesh size h = .
2
(b)
−(uxx + uyy ) = 0, 0 < x < 2, 0 < y < 1

u(0, y) = 0, u(1, y) = 0, 0 < y < 1
u(x, 0) = 0, u(x, 1) = sin πx, 0 < x < 1
1
2
(c)
−(uxx + uyy ) = 0, 0 < x < 2, 0 < y < 1

2
u(0, y) = 18y (1 − y), u(1, y) = 0, 0 < y < 1
u(x, 0) = 0, u(x, 1) = 0, 0 < x < 1
1
3
(d)
−(uxx + uyy ) = −2, 0 < x < 2, 0 < y < 1

u(0, y) = , u(π, y) = 1,0 < y < 1
u(x, 0) =, , 0 < x < π
1
3
Heat equation exercises
7.13. Use the explicit finite difference scheme (7.68 to 7.74) to approximate
the solution of the initial boundary value problem

 ut = uxx , 0 < x < 2, 0 < t < 1
u(0, t) = u(2, t) = 0, 0 ≤ t ≤ 1
u(x, 0) = f (x)

with
1, 0 ≤ x ≤ 1
f (x) = , h = 0.25 and 4t = 0.025.
0, 1 ≤ x ≤ 2
7.14 Solve exercise 7.12 by the Crank-Nicolson scheme (7.80) with

x, 0≤x≤1
f (x) =
1 − x, 1 ≤ x ≤ 2.
7.15 Solve problem 7.13 by the Crank-Nicolson Method with h = 0.25 and
∆t = 0 : 025.
7.16 Solve the boundary value problems
ut = c2 uxx , u(x, t) ∈ D = {0 < x < a, 0 < t < T }
u(x, 0) = x, u(x, T ) = 2T, 0 < x < a,

u(0, t) = 2t, u(a, t) = a, 0 < t < T.
7.17 Use the explicit finite difference for the wave equation to approximate
the solution of the initial boundary value problem
utt = c2 uxx ; 0 < x < l; 0 < t < T

u(0; t) = u(l; t) = 0; 0 ≤ t ≤ T
u(x; 0) = f (x), ut (x; 0) = 0; 0 ≤ x ≤ l
when
(a) c = 1, l = 1, T = 1; f (x) = x(1 − x); h = 0 : 25, ∆t = 0.1.

2
(b) c = 1, l = 2, T = 1; f (x) = e−16(x−1) , h = 0.4, ∆t = 0.1
(c) Consider the boundary-value problem (BVP)
utt = uxx ; 0 < x < 1; 0 < t < 0.5

u(0; t) = u(1; t) = 0; 0 ≤ t ≤ 0.5
u(x, 0) = sin πx, ut (x; 0) = 0; 0 ≤ x ≤ 1
(i) Use the separation method to solve the above (BVP).

(ii) Use the explicit scheme to approximate the solution of the problem
with h = 0 : 25 and ∆t = 0.1.
(iii) Compute the absolute error at each interior grid point.

Numerical Methods are used to solve problems in the oil industry and other
fields. Interested readers may find References [6] and [18] helpful. We suggest
to read references [5], [7], [9] and [19] to readers who want more knowledge
of applications of mathematical methods, models and algorithms to various
areas of science, engineering and technology. References [1] through [4], [8]
and [10] through [17] are also useful for further learning.
Bibliography
[1] W. Ames, Numerical Methods for Partial Differential Equations, Third

Edition, Academic Press, Boston, 1992.
[2] K. Atkinson, W. Hall, Theoretical Numerical Analysis, Springer, New
York, 2010.
[3] A. Di Bucchianico, R. M. M. Martheij, M. A. Peletier (Eds.), Progress in
Industrial Mathematics at ECMI 2014, Springer, 2016.
[4] J. V. Butcher, Numerical Methods for Ordinary Differential Equations,
Wiley, Chichester, UK, 2003.
[5] C. S Desai, J. Kundu, Introductory Finite Element Method, CRC Press,
Boca Raton, 2001.
[6] A. Iske, T. Randen (Eds.), Mathematical Methods and Modelling in Hy-
drocarbon Exploration and Production, Springer, 2005.
[7] C. Kanuto, M. Y. Hussaini, A. Quarteroni, T. A. Zang, Spectral Methods
in Fluid Dynamics, Springer, New York, 1998.
[8] L. Lapidus, G. F. Pinder, Numerical Solution of Partial Differential Equa-
tions in Science and Engineering, Wiley-Interscience, New York, 1982.
[9] C. R. MacCluer, Industrial Mathematics: Modelling in Industry, Science
and Government, Prentice Hall, 2000.
[10] P. Manchanda, K. Ahmad, A. H. Siddiqi (Eds), Current Trends in In-
dustrial and Applied Mathematics, Anamya Publishers, 2002.
[11] K. W. Morton, D. F. Meyers, Numerical Solution of Partial Differential
Equations, Cambridge University, Press, 1996.
[12] R. D. Richtmeyer, K. W. Morton, Difference Methods for Initial Value
Problems, Wiley-Interscience, New York, 1967.
[13] L. F. Shampine, Numerical Solutions of Ordinary Differential Equations,
Chapman & Hall, 1994.
[14] A. H. Siddiqi and M. Kocvara (Eds), Trends in Industrial and Applied
Mathematics, Kluwer, 2002.
521
522 Bibliography
[15] A. H. Siddiqi, I. Duff and O. Christensen (Eds.), Modern Mathematical

Methods, Models and Algorithms, Anamaya-Anshan, 2006.
[16] A. H. Siddiqi, A. K. Gupta and M. Brokate, Modelling of engineering

and Technological Problems, American Institute of Physics, Vol. 146, New
York, 2009.
[17] A. H. Siddiqi, R. C. Singh, P. Manchanda (Eds.), Mathematics in Sci-
ence and Technology, Mathematical Methods, Models and Algorithms in
Science and Technology, World Scientific, Singapore, 2011.
[18] A. H. Siddiqi, Emerging Applications of Wavelet Methods, American In-
stitute of Physics (AIP), Vol. 1463, New York, 2012.
[19] A. H. Siddiqi, P. Manchanda, R. Bhardwaj : Mathematical Models, Meth-
ods and Applications, Springer, Singapore, 2015.
Chapter 8
Complex Analysis
8.1 Motivation and Historical Development for Complex Analysis . 523

8.2 Functions of Complex Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
8.2.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
8.2.2 Geometrical Representation of Complex Numbers . . . . . . 526
8.2.3 Sets in Complex Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
8.2.3.1 Complex Sequences and Series . . . . . . . . . . . . . 536
8.2.3.2 Functions of Complex Variable . . . . . . . . . . . . 537
8.2.3.3 Cauchy-Riemann Equations . . . . . . . . . . . . . . . 541
8.3 Complex Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
8.4 Residues and Residue Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
8.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
8.4.2 Singularities and Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
8.5 Application of Residue Theory for Evaluation of Real Integrals 569
8.5.1 Evaluation of Integrals Involving Trigonometric
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
8.5.2 Evaluation of Several Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 573
8.6 Conformal Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
8.6.1 Complex Functions as Mappings . . . . . . . . . . . . . . . . . . . . . . . . 576
8.6.2 Conformal Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
8.6.3 Möbius Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
8.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
8.7.1 Electrostatics Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
8.7.2 Heat Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
8.7.3 Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
8.7.4 Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
8.1 Motivation and Historical Development for Complex

Analysis
As we know the real analysis or calculus is the study of functions of real
variables (f : R → R or f : R2 → R). Similarly complex analysis is the
study of functions of complex variables f : C → C, where C denotes the set
523
of complex
√ numbers z = x + iy, x ∈ R, y ∈ R) and the imaginary number
i = −1 is the root of the algebraic equation x2 + 1 = 0. It is well known
that complex analysis was developed as a result of mathematical curiosity but
subsequently it was found very useful in signal and image processing, fluid
flow, quantum mechanics and many other areas of engineering.
It is common belief that real analysis is more useful than complex analysis
but in practice the reverse is true. One strange application of complex anal-
ysis is in quantum mechanics, where complex numbers are used to represent
approximately the probabilities of different states of a system. Complex anal-
ysis can be thought of as a subject that deals with the study of calculus of
functions of imaginary numbers.
The story of imaginary numbers began in 1545 with the work of Italian
mathematician Girolamo Cardano (1501-1576). The historical development of
imaginary numbers is covered in John Stilwell’s Mathematics and Its History
(Springer, 2010). Cardano commented that arithmetic dealing with quanti-
ties involved mental tortures and noted that computations sometimes seemed
useless. Many mathematicians of that time agreed with him.
In 1572, Rafael Bombelli, another Italian mathematician, showed the util-
ity of roots of negative numbers. Interested readers√are referred to Paul Nahin’s
book titled An Imaginary Tale: The Story of −1 (Princeton University
Press, 1998). See also the November 1999 issue of Review Notices of AMS,
pages 1233 through 1236. History indicates that the seeds of imaginary num-
bers were planted in the 12th century when Arabian algebra was translated
and introduced to Italy.
Rene Descartes (1596-1650), the French mathematician and philosopher,
devised imaginary terms in complex numbers. One of the most distinguished
mathematicians in history, Carl Friedrich Gauss (1777-1855), suggested the
use of complex numbers instead of imaginary numbers. Euler was the first
mathematician to propose the square root of 1 symbolized by i.
Complex numbers are ordered pairs of real numbers. They may be rep-
resented by points in a plane. The xy plane is known as a complex plane or
Argand’s plane (for Jean Robert Argand, 1768-1822) when displaying com-
plex numbers. Gauss published a work on geometric representation of com-
plex numbers as points in a plane. He also published the first proof of the
fundamental theorem of algebra: every polynomial equation of degree n with
complex coefficients different from zero has n roots in complex numbers.
Great controversy surrounded the acceptance of complex numbers in al-
gebra and analysis in 1770, by 1830, the geometry of complex numbers was
accepted by Gauss and then by the rest of the mathematics world. Gauss and
Simon Denis Poisson (1781-1840) initiated studies of complex functions and
their integrals. August Louis Cauchy (1789-1857) contributed to the field in
a series of papers written between 1814 and 1845. He formulated his inte-
gral theorem and related concepts such as independence of path and integral
representations of functions and their derivatives.
Pierre Alphonse Laurent (1813-1854) developed his Laurent series around
Complex Analysis 525
1843. Karl Weierstrass of Germany devised complex analysis on a rigorous

foundation. Geoge Bernhard Riemann (1826-1866) introduced derivatives of
functions of complex variables. His work and that of Cauchy led to naming of
the Cauchy-Riemann equation that specifies conditions for a complex function
to be differentiable at a point.
8.2 Functions of Complex Variables

8.2.1 Complex Numbers
A symbol of the form x + iy or x + yi or where x and y are real numbers
and i2 = −1 , is called a complex number.
Let z = x + iy and w = u + iv and be two complex numbers, their sum z + w,
their multiplication zw and division z/w, w 6= 0 are defined as follows:
Addition or sum:
z + w = (x + u) + i(y + v).
Multiplication:
zw = (x + iy)(u + iv).
Division:
z x + iy x + iy u − iv
= =
w u + iv u + iv u − iv
(xu + yv) + (yu − xv)i
=
u2 + v 2
xu + yv yu − xv
= + 2 i.
u2 + v 2 u + v2
x is called the real part of the complex numbers z and often denoted Rez,
while y is called the imaginary part of z and it is often denoted Imz. Two
complex numbers z and w are called equal, that is, z = w if real part of z
is equal to real part of w, that is x = u and imaginary part of z is equal to
imaginary part of w, that is, y = v.
Let α be a scalar and z = x + iy be a complex number then αz = αx + iαy.
In particular, if α = 1 then
z−w = (x + iy) + (−1)(u + iv)

= x + iy − u − iv
= (x − u) + i(y − v).
Example 263. Let z = 6 + 5i and u = 4 + 3i. Find

(a) z + u (b) z − u
z
(c) zu (d)
u
Solution: (a) z + u = (6 + 4) + (5 + 3)i = 10 + 8i.
(b) z − u = (6 − 4) + (5 − 3)i = 2 + 2i.
(c) z.u = (6 + 5i)(4 + 3i) = (24 − 15) + (18 + 20)i = 9 + 38i.
z 6 + 5i 4 − 3i 24 + 20i − 18i + 15 39 + 2i 39 2
(d) = = = = + i. The set
u 4 + 3i 4 − 3i 16 + 9 25 25 25
of all complex numbers is often denoted by C.
Magnitude: The magnitude of z = x + iy is denoted by |z| or |x + iy|

and is defined by
p
|z| = |x + iy| = x2 + y 2 .
Conjugate: The complex conjugate of z = x + iy is denoted by x + iy or z
and is defined by
z = x + iy = x − iy.
Example 264. (a) Find the magnitude of 3 + i8 and 2 − i4i.
(b) Find the complex conjugate of 16 + 9i and 4 − 8i.
(c) Show that
z+z
Re(z) =
2
and
z−z
Im(z) =
2
where z is the complex conjugate z = x + iy.
(d) Let z = 5 − 7i. Find |z|2 .
8.2.2 Geometrical Representation of Complex Numbers

Complex numbers can be given two natural geometric interpretations. One
can identify the complex number x+iy with point (x, y) in the plane as shown
in Figure 8.1. Each real number x or x + i0, is identified with the point (x, 0)
on the x-axis (horizontal axis) often called the real axis. A number 0 + iy or
just iy is called a pure imaginary number and is associated with the point
(0, y) on the vertical axis, also called the imaginary axis. This correspondence
between complex numbers and points in the plane is referred as the complex
(x − y) and also known as Argand’s plane to pay tribute to Jean Robert
Argand (1768-1822), a Swiss bank employee.
The second geometrical interpretation of complex numbers is through vec-
tors. The complex number z = x + iy, or the point (x, y), may be considered
a vector x(1, 0) + y(0, 1) in the plane which may in turn be represented as an
arrow from the origin to (x, y), see Figure 8.2.
Imaginary axis
(y-axis)
•
x+iy
(x,y)
Real axis
(x-axis)
FIGURE 8.1: The Complex Plane
(x, y)
x(l, O)+y(l ,0)
x+iy
FIGURE 8.2: Complex Number as Vector in Plane
The first component of this vector is Re(z) and the second component is
Im(z). In this interpretation, the definition of addition of complex numbers
is equivalent to the parallelogram law for vector addition because we add two
vectors by adding their respective components, see Figure 8.3.
. .(x+u)+i(y+v)
y
FIGURE 8.3: Parallelogram Law for Addition of Complex Numbers
Example 265. Draw Argand’s diagram (pictorial representation) of z =

x + iy and z = x − iy. See Figure 8.4.
Polar form of complex number
As we know rectangular coordinates (x, y) and polar coordinates (r, θ) are
related as
x = r cos θ, y = r sin θ.
Thus
=x+iy
z=x-iy
FIGURE 8.4: Argand’s Diagram of z and z̄

z = x + iy = r cos θ + ir sin θ = r(cos θ + i sin θ) = reiθ ,

applying Euler’s formula. This representation of z is called the polar form
of a complex number.
As we see in Figure 8.5, the polar coordinate r is the distance from the origin
r
I xsm
.8
1
I
8
I
X
FIGURE 8.5: Polar Coordinates

p
to the point (x, y) in xy plane. It is clear that r = x2 + y 2 = |z|. The angle
θ of inclination of the vector z measured in radians from the positive real
axis is positive when measured counterclockwise and negative when measured
clockwise. Then angle θ is called an argument of complex number that must
y
satisfy tan θ = . The solution of this equation is not unique.
x
The angle θ argument of z is often denoted by argz. The argument of a
complex number in the interval −π < θ ≤ π is called the principal argument
of z and is denoted by argz. It may be noted that
π
Arg(i) = .
2
Example 266. Express the following complex numbers in the polar form and
find their arguments:
(a) 2 + 8i.
(b) 6 + 4i.
√
(c) 3 − 3i.
Solution: (a) −2 + 8i = x + iy ⇒ x = −2, y = 8
p √ √
z| = r = x2 + y 2 = 4 + 64 = 68
y 8
v = tan−1 = tan−1 = tan−1 (−4)
x −2
√
−2 + 8i = 68 (as tan−1 (−4) + i sin tan−1 (−4))
√ −1
= 68ei tan (−4) .
√ √
(b) r = 36 + 16 = 52
√
y −3 −1
θ = tan−1 = tan−1 = tan−1 √
x 3 3
√ −1 1
z = 2 3ei tan (− 3 ) .
(c) Let √ √
3 − 3i = z ⇒ x = 3, y = − 3,
p √ √ √
r = x2 + y 2 = 9 + 3 = 12 = 2 3
√
−1 y −1 − 3 1
θ = tan = tan = tan−1 (− √ )
x 3 3
√ i tan−1 (− √ )
1
z = 2 3e 3 .
Multiplication and Division in Polar Form

Let
z = x + iy = r(cos θ + i sin θ),
w = u + iv = λ(cos ϕ + i sin ϕ),
then
zw = rλ[(cos θ cos ϕ − sin θ sin ϕ) + i(sin θ cos ϕ + cos θ sin ϕ)]

and for w 6= 0
z r
= [(cos θ cos ϕ + sin θ sin ϕ) + i(sin θ cos ϕ − cos θ sin ϕ).
w λ
zw can also be written as
zw = rλ[cos(θ + ϕ) + i sin(θ + ϕ)] (8.1)

z
and , w 6= 0 is given by
w
z r
= [cos(θ − ϕ) + i sin(θ − ϕ). (8.2)
w λ
It follows from (8.1) and (8.2) that
z |z|
|zw| = |z||w|, | | = (8.3)
w |w|
and
z
arg(zw) = arg z + arg w, arg( ) = arg z − arg w. (8.4)
w
Powers and roots of complex numbers

(8.1) and (8.2) can be used to find integral powers of z, z n , where n is an
integer. For
z = r(cos θ + i sin θ)
choose z1 = z and z2 = z; then (8.1) implies
z 2 = r2 [cos(θ + ϕ) + i sin(θ + ϕ)] = r2 [cos 2θ + i sin 2θ]
z3 = z 2 z = r2 [cos 2θ + i sin 2θ] r [cos θ + i sin θ]

= r3 [cos 3θ + i sin 3θ].
Since
0
arg(1) = tan−1 = 0,
1
it follows from (8.2)
1
= z −2 = r−2 [cos(−2θ) + i sin(−2θ)].
z2
Continuing in this manner, we obtain a formula for z n ,
z n = rn (cos nθ + i sin θ).
Let z = cos θ + i sin θ, then |z| = 1 = r and by (8.3) we get
(cos θ + i sin θ)n = cos nθ + i sin θ. (8.5)
Equation (8.5) is known as DeMoivre’s formula.
Example 267. (a) Write the given complex numbers in the polar forms:
√ 12
(i) 6i (ii)−2 − 2 3 (iii) −1 + 5i (iv) √
3+i
π π
(b) Write z = 10(cos + i sin ) in the form x + iy.
5 5
√ 9
(c) Find (1 + 3i) .
1 1 1
(d) Find (i) (8) 3 (ii) i 3 (iii) (−1 + i) 3 .
Solution: (a) (i) 6i = x + iy ⇒ x = 0, y = 6
p √
r = 02 + 62 = 36
y 6 π
θ = tan−1 = tan−1 = tan−1 ∞ =
x 0 2
iπ π π
6e 2 = 6i = 6(cos + i sin )
2 2
is the polar form of 6i.

√
(ii) − 2 − 2 3i = x + iy
√
⇒ x = −2 , y = −2 3
√
⇒r = 4 + 12 = 4
√
−1 −2 3
√ 4π
θ = tan = tan−1 3 =
−2 3
√
Polar form of −2 − 2 3 is
4π 4π i4π
4(cos + i sin ) = 4e 3 .
3 3
(iii) − 1 + 5i = x + iy ⇒ x = −1, y = 5
√ 5
⇒r= 6 , θ = tan−1
1
√ i tan−1 (−5)
−1 + 5i = 6e .
√ √
12 12( 3 − i) 12 3 − i12 √
(iv) √ = √ √ = = 3 3−3i
3+i ( 3 + i)( 3 − i) 3+1
√
⇒ x = 3 3, y = −3
√ −3 1
r = 27 + 9 = 6, θ = tan−1 √ = tan−1 (− √ )
3 3 3
12 1 1
√ = 6(cos tan−1 (− √ ) + i sin tan−1 (− √ )
3+i 3 3
π π −iπ
= 6(cos(− ) + i sin(− )) = 6e 6 .
6 6
p
(b) r2 = x2 + y 2
100 = x2 + y 2
π
100 = x2 + x2 tan
4
π
100 = x2 sec2
4
y
tan θ =
x
π y
tan =
4 x
√
100
or x =
sec π4
θ = 8.0902 and z = 8.0902 + 5.8779i.

√
(c) (1 + 3i)9 = r9 (cos 9θ + i sin 9θ)
q √
where r = 12 + ( 3)2
√
3
tan θ =
1
√ π
θ = tan−1 3 =
√ 3
(1 + 3i)9 = 29 (cos 3π + i sin 3π) = 29 cos π.
Remark 61. There is no ordering in the set of complex numbers.

Remark 62. The binomial expansion of the real numbers is also valid for
complex numbers in the following form:

n n n
(z + w) = Σj=0 z n−j wj .
j
8.2.3 Sets in Complex Plane

p
If z = x + iy is any complex number, |z| = x2 + y 2 is the distance from
the origin to z (point (x, y) in the plane) in the complex plane.
If w = u + iv is also a complex number, then
p
|z − w| = |(x − u) + i(y − v)| = (x − u)2 + (y − v)2
is the distance between z and w in the complex plane (Figure 8.6). If a is a

complex number and ρ is a positive real number such that |z − a| = ρ, the
locus of points satisfying this condition is the circle of radius ρ about a and
often referred as “the circle |z − a| = ρ”. The points z satisfying the inequality
|z − a| < ρ, ρ > 0 lie within, but not on, a circle of radius ρ centred at a
(Figure 8.7). This set is called a neighborhood of a or an open disk.
If a = 0, then any point on the circle |z| = r has the polar form.
z = reiθ
where θ is the angle from the positive real axis to the line from the origin
through z (Figure 8.8).
Definition 88. (a) A point a is said to be an interior point of a set S of
the complex plane y entirely within S. A subset S of a complex plane is called
an open set if every point z of S is an interior point.
(b) A point a is said to be a boundary point of a set of complex numbers
S if every open disk about a contains at least one point in S and at least one
point not in S.
S is called closed if it contains all of its boundary points.
(c) A complex number a is a limit point of a set S of complex numbers if
every open disk about a contains atleast one point of S different from a.
FIGURE 8.6: |z − w| is Distance between z and w
lz-al=p
FIGURE 8.7: The Circle of Radius ρ about a
Remark 63. (a) A boundary point may or may not be in S. No point can
lzl=r
FIGURE 8.8: Circle of Radius r about Origin
simultaneously be interior and a boundary point as the definitions of interior

point and boundary point are exclusive.
(b) The set of all boundary points of S is called the boundary of S, and is
denoted by ∂S.
(c) Limit point differs from boundary point in that every open disk about the
point contains something from S other than the point itself (see Example
268).
(d) Every point of a set is either an interior point or a boundary point.
Example 268. Let A consist of all complex numbers z = x + iy with y > o,
with the point −23i; see Figure 8.9. Then −23i is a boundary point of A,
because every disk about −23i certainly contains points not in A, but also
contains a point in A, namely −23i itself. Every real number (point of x-axis)
is also a boundary point of A.
Example 269. Let B consist of all points satisfying |z − a| < ρ. Every point
of B in an interior point because about any point is B we can draw a disk of
small enough radius to contain only points in B. Thus B is an open set which
does not include any points on its bounding circle |z − a| = ρ.
·23i
FIGURE 8.9: −23i is Boundary Point of A
8.2.3.1 Complex Sequences and Series

Major parts of the theory of complex sequences and series are analogues
to real calculus.
Definition 89. (a) A sequence zn is a function whose domain is the set of
positive integers, n = 1, 2, 3, . . . ; a complex number say, zn , is assigned.
(b) A sequence of complex numbers zn is said to be convergent to a complex
number z if Rezn and Imzn respectively converge to Rez and Imz.
This can also be written as
The complex sequence zn is called convergent to z if given any positive
number , there is a positive number N such that
|zn − z| < ,
if n ≥ N . If zn converges to z then we write zn → z or limn→∞ zn = z.

(c) zn is called a bounded sequence if for some M, |zn | ≤ M for
n = 1, 2, 3 . . . or equivalently a sequence is called bounded if there is a
disk containing all of its terms.
P∞
Definition 90. (a) Let {zn } be a complex sequence; then n=1 zn is called a
PN
complex
P∞ series. SN = n=1 zn is called the N th partial sum of this series.
n=1 zn is said to converge to z if SN converges to z.
(b) For any a,
∞
X
az n = a + az + az 2 + · · · + az n + . . .
n=1
is called a geometric series.

Example 270. (a) Write the 100 th term of the sequences 2 + tn and
in+1
.
n+2
ni
(b) Examine whether is convergent or not.
n + 3i
(c) Give an example of a divergent complex sequence.
(d) Determine whether the given geometric series are convergent or not.
n
P∞ i
(i) n=1 .
2
n
P∞ 2
(ii) n=0 3 .
1 + 2i
i101
Solution: (a) 2 + t100 and .
102
(b) Convergent.
(c) {ni}.
(d) (i) Convergent as geometric series with common ratio < 1.
8.2.3.2 Functions of Complex Variable

Recall the definition of a general function: function from a set A to set B
is a rule of correspondence that assigns to each element in A a unique element
of B. If y is an element of B associated with an element of x of A, we write
y = f (x) and A is called a domain of f and the set of all f (x) corresponding
to each x is called the range of f . If A is a set of real numbers f is called a
function of real variable. If A is a set of complex numbers, then f (z) is called a
function of complex variables z or simply a complex function. The value
or image of complex number z will be some complex number w = u + iv, that
is,
w = f (z) = u(x, y) + iv(x, y) (8.6)
where u and v are the real and imaginary parts of w and are real valued
functions.
Example 271. Examples of functions of a complex variable z.

(a) f (z) = z + Im(z).
(b) f (z) = z − Re(z) = x + iy − x = iy.
(c) f (z) = z 2 + 4z.

z+i
(d) f (z) = .
z2 + 4
z
(e) f (z) = 2 6 i, z 6= −i.
,z =
z +1
(f) Find the value of the line Re(z) = 1 under the mapping f (z) = z 2 .
Remark 64. We cannot draw a graph of a complex function w = f (z) since

a graph would require four axes in a four-dimensional coordinate system. A
complex function w = f (z) can be interpreted as a mapping or transformation
from the z-plane to the w-plane, see Figure 8.10.
Definition 91. (a) Suppose that function f is defined in some neighbourhood
of z0 , except possibly at z0 itself. Then f is said to posses a limit at z0 written
limz→z0 f (z) = l. if, for each ε > 0, there exists a δ > 0 such that |f (z) − l| <
whenever 0 < |z − z0 | < δ.
(b) A function f is continuous at a point z0 if limz→z0 f (z) = f (z0 ).
(c) A function of the type
f (z) = an z n + an−1 z n−1 + · · · + a2 z 2 + a1 z + a0 , an 6= 0 (8.7)
is called a polynomial of degree n.

(d)
g(z)
f (z) = , (8.8)
h(z)
where g and h are polynomials is called a rational function. f (z) is continuous
except at those points for which h(z) is zero.
Definition 92. Let a complex function f be defined in a neighborhood of a
point z0 . The derivative of f at z0 , denoted by f 0 (z0 ) is defined as
f (z0 + ∆z) − f (z0 )

f 0 (z0 ) = lim (8.9)
∆z→0 ∆z
provided this limit exists.
(ll)::·plliM (b)w·plliM
FIGURE 8.10: Complex Function as Mapping
If the limit (8.9) exists, the function f is called differentiable at z0 . The

derivative of the function w = f (z) is also written
dw
.
dz
Remark 65. (i) As in real variables, differentiability implies continuity,

that is, if f is differentiable at z0 then f is continuous at z0 .
(ii) The rules of differentiation are the same as in the calculus of real vari-
ables. If f and g are differentiable at point z and c is a complex constant,
then
(a)
dc d
= 0, cf (z) = cf 0 (c). (8.10)
dz dz
(b)
d
[f (z) + g(z)] = f 0 (z) + g 0 (z). (8.11)
dz
(c)
d
[f (z)g(z)] = f (z)g 0 (z) + f 0 (z)g(z). (8.12)
dz
(d)
d f (z) g(z)f 0 (z) − f (z)g 0 (z)
[ ]= . (8.13)
dz g(z) [g(z)]2
(e)
d
(f (g(z))) = f (g(z))g 0 (z). (8.14)
dz
(f)
d n
z = nz n−1 , and n is an integer. (8.15)
dz
Definition 93. A complex function w = f (z) is said to be analytic (holo-
morphic) at a point z0 if f is differentiable at z0 and at every point in some
neighbourhood of z0 . A function f is analytic in a domain if it is analytic at
every point in D. A function that is analytic at every point z is said to be an
entire function.
Example 272. (a) Show that polynomial functions are entire functions.
(b) Find the image or value of the given line under mapping f (z) = z 2 :
(i) x = −3.
(ii) y = x.
(iii) y = −x.
(c) Express the following functions in the form f (z) = u + iv:
(i) f (z) = 6z + 9i.
(ii) f (z) = z 3 − 4z.
z
(iii) f (z) = .
z+1
(d) Evaluate the given function at the indicated points:
(i) f (z) = 2x − y 2 + i(xy 3 − 2x2 + 1).

(i) 2i.
(ii) 2 + i.
(ii) f (z) = ex cos y + iex sin y.
πi
(i) .
4
πi
(ii) 3 + .
3
(e) Find limz→i (4z 3 − 5z 2 + 4z + 5i).
z 2 − 2z + 2
(f) limz→1+i .
z 2 − 2i
z
(g) Show that limz→z0 does not exist.
z̄
(h) Use the definition of the derivative to show that
f 0 (z) = 2z if f (z) = z 2 .
(i) Use the basic rules of differentiation to find f 0 (z) if

(i) f (z) = 5z 4 − iz 3 + (8 − i)z 2 − 6i.
(ii) f (z) = (z 5 + 3iz 3 )(z 4 + iz 3 + 2z 2 − 6iz).
5z 2 − z
(iii) f (z) = .
z3 + 1
(j) Show that f (z) = z is not differentiable at any point.
Solution: (a) A polynomial is differentiable at every point of the domain.
(b) (i) Substituting x = −3 into u = x2 − y 2 , v = 2xy, f (z) = z 2 =
u(x, y) + iv(x, y), or (z + iy)2 = u(x, y) − x2 − y 2 , v = 2xy, given
parametric equations u = 9 − y 2 , v = −6y.
v v2
Using y = − the first equation gives v = 9 − .
6 36
The graph is a parabola.
(ii) y = x gives u = 0, v = 2x2 . Since x2 ≥ 0 for all real values of x,
the image is the origin and the positive v-axis.
(iii) y = −x gives u = 0, v = −2x2 . Since x2 ≤ 0 for all real values of
x, the image is the origin and the negative v-axis.
(c) (i) f (z) = 6z + 9i = 6(x + iy) + 9i = 6x + i(9 + 6y) where u(x, y) = 6x

and v(x, y) = 9 + 6y.
(ii) z 3 − 4z = f (z)f (z) = (x3 − 3xy 2 − 4x) + i(3x2 y − y 3 − 4y)
x2 + y 2 + x y
f (z) = + .
(x + 1)2 + y 2 (x + 1)2 + y 2
z x + iy x + iy
(iii) f (z) = = =
z+1 x + iy + 1 (x + 1) + iy
x + iy (x + 1) − iy
=
(x + 1) + iy (x + 1) − iy
x2 + y 2 + x (y + y 2 )
= + i
(x + 1)2 + y 2 (x + 1)2 + y 2
x2 + y 2 + x y + y2
⇒ u(x, y) = 2 2
, v(x, y) = .
(x + 1) + y (x + 1)2 + y 2
8.2.3.3 Cauchy-Riemann Equations

Theorem 58. Let f (z) = u(x, y) + iv(x, y) be differentiable at a point z =
x + iy. Then first order derivatives of u and v exist and satisfy the Cauchy-
Riemann equations
∂u ∂v ∂v ∂u
= and =− . (8.16)
∂x ∂y ∂x ∂y
Proof. We have
f (z + ∆z) − f (z)
f 0 (z) = lim (8.17)
∆z ∆z
as f (z) is differentiable at z. Putting
f (z) = u(x, y) + iv(x, y) and ∆z = ∆x + i∆y
in (8.17), we get
u(x + ∆x, y + ∆y) + iv(x + ∆x, y + ∆y) − u(x, y) − iv(x, y)

f 0 (z) = lim .
∆z→0 ∆x + i∆y
(8.18)
Since this limit exists, ∆z can approach zero from any convenient direction.
In particular, if ∆z → 0 horizontally then ∆z = ∆x and so (8.18) becomes
u(x + ∆x, y) − u(x, y) v(x + ∆x, y) − v(x, y)

f 0 (z) = lim ∆x → 0 + i lim .
∆x ∆x→0 ∆x
(8.19)
Two limits in (8.19) exist as it is differentiable at the point z. By definition,

limits in (8.19) are partial derivatives, that is ,
∂u ∂v
f 0 (z) = +i . (8.20)
∂z ∂x
Now, if we let ∆z → 0 vertically, then ∆z = i∆y and (8.20) takes the form
u(x, y + ∆y) − u(x, y) v(x, y + ∆y) − v(x, y)

f 0 (z) = lim + i lim
∆y→0 i∆y ∆y→0 i∆y
or equivalently
∂u ∂v
f 0 (z) = −i + . (8.21)
∂y ∂y
Equating real and imaginary points of (8.20) and (8.21), we get the Cauchy-
Riemann equations
∂u ∂v ∂v ∂u
= , =− .
∂x ∂y ∂x ∂y
Remark 66. The Cauchy-Reimann equations are not sufficient for ensuring
analyticity of f (z) unless derivatives
∂u ∂v ∂u ∂v
, , and
∂x ∂x ∂y ∂y
are continuous. In fact the following result holds.
Suppose real valued functions u(x, y) and v(x, y) are continuous and have
∂u ∂v ∂u ∂v
continuous derivatives , , and in a domain D. If u and v satisfy the
∂x ∂x ∂y ∂y
Cauchy-Riemann equation then the complex function f (z) = u(x, y) + iv(x, y)
is analytic in D.
Analyticity implies differentiability but not vice versa. We have an analogue of
the above result that gives sufficient conditions for differentiability, namely if
u(x, y) and v(x, y) are continuous and have continuous first order derivatives
in a neighborhood of z, and if u and v satisfy the Cauchy-Reimann equations
at the point z, then f (z) = u(z, y) + iv(x, y) is differentiable at z and f 0 (z) is
given by
∂u ∂v ∂v ∂u
f 0 (z) = +i = −i .
∂x ∂x ∂y ∂y
Remark 67. It may be noted that if a complex function f (z) = u(x, y) +
iv(x, y) is analytic through a domain D, then the real functions u(x, y) and
v(x, y) must satisfy Cauchy-Riemann equations at every point of the domain.
Example 273. (a) Find u(x, y) and v(x, y) if
(i) f (z) = z 2 .
1
(ii) f (z) = , z 6= 0.
z
(b) Show that f (z) = (2x + y) + i(y 2 − x) is not analytic at any point.
(c) Show that f (z) = Re(z) is not analytic for z 6= 0.
(d) Show that the following functions are analytic in an appropriate domain.
(i) f (z) = ex cos y + iex sin y.

x−1 y
(ii) f (z) = −i .
(x − 1)2 + y 2 (x − 1)2 + y 2
Solution: (a) (i) f (z) = z 2 = (x+iy)2 = x2 −y 2 +2ixy
⇒ u(x, y) = x2 − y 2 , v(x, y) = 2xy.
1 1 1 x − iy x − iy
(ii) f (z) = = = = 2
z x + iy x + iy x − iy x + y2
x y
= −i 2
x2 + y 2 x + y2
x y
⇒ u(x, y) = 2 2
, v(x, y) = − 2 .
x +y x + y2
(b) f (z) = (zx2 + y) + i(y 2 − x)

u(x, y) = 2x2 + y, v(x, y) = y 2 − x
∂u ∂v
= 4x, = −1
∂x ∂x
∂u ∂v
= 1, = 2y
∂y ∂y
∂u ∂v
⇒ = 4x 6= ,
∂x ∂y
⇒ f (z) is not analytic as the Cauchy-Riemann equation is not satisfied.
(c) f (z) = Re(z) = x+i0 ⇒ u(x, y) = x, v(x, y) = 0
∂u ∂v
⇒ = 1, = 0,
∂x ∂y
⇒ Cauchy-Riemann equation is not satisfied, for z 6= 0. Hence f (z) is not
analytic.
(d) (i) u(x, y) = ex cos y, v(x, y) = ex sin y

∂u ∂v
= ex cos y, = −ex sin y
∂x ∂y
∂v ∂v
= ex sin y, = ex cos y
∂x ∂y
∂u ∂v ∂v ∂u
⇒ = and =− ,
∂x ∂y ∂x ∂y
⇒ Cauchy-Riemann equations are satisfied and partial derivatives of the first
order are continuous. Thus the given function is analytic for all z.
x−1 y
(ii) u= 2 2
, v=
(x − 1) + y (x − 1)2 + y 2
∂u ∂y 2 − (x − 1)2 ∂v
= ,
∂x ∂[(x − 1)2 + y 2 ] ∂y
∂u 2y(x − 1) ∂v
=− = −
∂y [(x − 1)2 + y 2 ]2 ∂x
and f is analytic in any domain not containing z = −1.
Harmonic functions: A real valued ϕ(x, y) is called harmonic in the do-
∂2ϕ ∂2ϕ ∂2ϕ
main D if 2
, 2 and exist and are continuous and
∂x ∂y ∂x∂y
∂2ϕ ∂2ϕ
+ = 0.
∂x2 ∂y 2
Theorem 59. Let f (z) = u(x, y) + v(x, y) be analytic in a domain D. Then
the functions u(x, y) and v(x, y) are harmonic.
Proof. Let u(x, y) and v(x, y) have continuous second partial derivatives.
Since f is analytic, the Cauchy-Riemann equations are satisfied. Differen-
tiating both sides of
∂u ∂v
=
∂x ∂y
with respect to x and both sides of
∂u ∂v
=−
∂y ∂x
with respect to y we get
∂2u ∂2v ∂2u ∂2v
= , = − .
∂x2 ∂x∂y ∂y 2 ∂x∂y
Under the assumption of continuity, the mixed partials are equal. By adding
these equations we get
∂2u ∂2u ∂2v ∂2v
2
+ 2 = − = 0.
∂x ∂y ∂x∂y ∂x∂y
This implies that u(x, y)is harmonic. Similarly we can show that
∂2v ∂2v
+ = 0.
∂x2 ∂y 2
That is, v(x, y) is harmonic.

Conjugate harmonic functions:
Let u(x, y) be a harmonic function in D then v(x, y) is called a conjugate
harmonic function if u(x, y) + iv(x, y) is analytic function in D.
Example 274. Show that the following functions are harmonic:
(i) u(x, y) = x
(ii) u(x, y) = loge (x2 + y 2 )
(iii) u(x, y) = ex (x cos y − y sin y)
∂u
Solution: (i) u(x, y) = x ⇒ =1
∂x
and
∂2u ∂2u
=0 and =0
∂x2 ∂y 2
∂2u ∂2u
+ 2 = 0,
∂x2 ∂y
⇒ u(x, y) is harmonic.
∂2u
(ii) = 2ex cos y+ex (x cos y−y sin y)
∂x2
∂2u
= ex (−x cos y + y sin y − 2 cos y)
∂y 2
∂2u ∂2u
⇒ + 2 = 0.
∂x2 ∂y
Thus u is harmonic.
Example 275. Let u(x, y) = x2 − y 2 , Find a conjugate harmonic function of
u.
Solution: u(x, y) = x2 − y 2 is harmonic.
Now
∂u ∂v
= 2x =
∂x ∂y
⇒ v = 2xy + h(x)
∂u ∂v
= −2y = − = −2y − h0 (x)
∂y ∂x
implies h0 (x) = 0 or h(x) = c. Therefore
f (z) = x2 − y 2 + i(2xy + c)
⇒ v(x) = (2xy + c)
is a conjugate harmonic function of u.
Exponential and logarithmic functions
ez = ex+iy = ex (cos y + i sin y).
For z 6= 0,and θ = arg z
ln z = loge |z| + i(θ + 2nπ), n = 0, ±1, ±2, . . . .
Trignometric and hyperbolic functions

For any complex number z = x + iy
eiz − e−iz eiz + e−iz

sin z = and cos z =
2i 2
ez − e−z ez + e−z
sinh z = and cosh z = .
2 2
The inverse sine function, denoted by sin−1 z or arcsin z, is defined by
1
sinh−1 z = ln[z + (z 2 + 1) 2 ]
1
cosh−1 z = ln[z + (z 2 − 1) 2 ].
8.3 Complex Integration

In complex variables, a piecewise smooth curve RC is also called
H a contour
or path. An integral of f (z) on C is denoted by C f (z)dz or C f (z)dz. If
the contour C is closed, it is referred to as a contour integral or a complex
line integral. If f is continuous on a smooth curve C given by z(t) = x(t) +
iy(t), a ≤ t ≤ b, then
Z Z b
f (z)dz = f (z(t))z 0 (t)dt. (8.22)
C a
Remark 68. If f is expressed in terms of the symbol z then to evaluate

f (z(t)) we replace z by z(t). If f is not expressed in terms of z, then to
evaluate f (z(t)) we replace x and y wherever they appear by x(t) and y(t)
respectively.
Example 276. (i) Evaluate C z̄dz, where C is given x = 3t, y = t2 , −1 ≤
R
t ≤ 4.
R
(ii) Evaluate C (z + 3)dz, where C is x = 2t, y = 4t − 1, 1 ≤ t ≤ 3.
(iii) Evaluate C z 2 dz, where C is z(t) = 3t + 2it, −2 ≤ t ≤ 2.
R
Review of line integrals and independence of paths

Rb
The notion of the definite integral a f (x)dx, that is, integration defined over
an interval, can be generalized to integration of function defined along a curve.
We require the following concepts. Suppose C is a curve parameterized by
x = f (t), y = g(t), a ≤ t ≤ b
and A and B are points (f (a), g(a)) and (f (b), g(b)) respectively.
A curve C is smooth if f 0 and g 0 are continuous on the closed interval
[a, b] and not simultaneously on the open interval (a, b), see Figure 8.11a.
C is piecewise smooth if it consists of a finite number of smooth curves
C1 , C2 , . . . , Cn joined end to end, namely C = C1 ∪ C2 ∪ C3 ∪ . . . , Cn (see
Figure 8.11b). C is closed if A = B, see Figure 8.12a.
C is a simple closed curve if A = B and the curve does not cross itself,
see Figure 8.12b.
If C is not a closed curve, then the positive direction on C is the direction
corresponding to increasing values of t.
FIGURE 8.11a: Smooth Curve
Definition 94. Let f (z) be a complex function defined along a curve C in

the complex plane and let
F (z) = u(x, y) + iv(x, y)

c.
FIGURE 8.11b: Piecewise Smooth Curve
FIGURE 8.12a: Closed Piecewise Smooth Curve
FIGURE 8.12b: Simple Closed Curve
and C be a smooth curve defined by

x = x(t), y = y(t), a ≤ t ≤ b.
Furthermore divide C into n sub arcs according to the partition
a = t0 < t1 < t2 < < tn = b
of [a, b]. The corresponding points on the curve C are
z0 = x0 + iy0 = x0 (t) + iy0 (t),
z1 = x1 + iy1 = x1 (t) + iy1 (t), . . . , zn (t) = xn (t) + iyn (t).
Let
4zk = zk − zk−1 , k = 1, 2, . . . , n.
Assume that kP k denotes the norm of the partition, that is, kP k = maximum
R
of |4zk |. The contour integral or integral of f (z) on C, denoted by C f (z)dz
is defined by
Z Xn
f (z)dz = lim f (zk∗ )4zk .
C kP k→0
k=1
Properties of contour integrals

Theorem 60. Let f and g be continuous in a domain D and C be a smooth
curve inside D. Then
R R
(a) C kf (z)dz = k C f (z)dz.
R R R
(b) C [f (z) + g(z)]dz = C f (z)dz + C g(z)dz.
R
(c) RFor C = C1 R∪ C2 , where C1 and C2 are smooth curve, C f (z)dz =
C1
f (z)dz + C2 f (z)dz.
R R
(d) −C f (z)dz = − C f (z)dz, where −C denotes the curve having opposite
orientation of C.
(e) IfR f is continuous on C and |f (z)| ≤ M for all z on C, then
| C f (z)dz| ≤ M L, where L is the length of C.
FIGURE 8.13a: Non Simply Connected Domain
We concentrate here on contour integrals, where the contour C is a simple

Hclosed curve. In this case the contour integral of f (z) on C is denoted by
C
f (z)dz. A domain D, on which f is defined, is called simply connected
(Figure 8.13a) if every simple closed curve C inside D can be shrunk to a
point without leaving D. In other words a simply connected domain has no
holes. A domain that is not simply connected is called a multiply connected
domain (Figure 8.13b). A domain having one hole is called doubly connected
while a domain with two holes is triply connected. It may be noted that the
entire complex plane is simply connected.
FIGURE 8.13b: Multiply Connected Domain
Theorem 61. (Cauchy Theorem) Let f be as analytic H function in a simply

connected domain D and f 0 continuous in D. Then C f (z)dz = 0 for every
simple closed contour C in D. See Figure 8.14.
In 1883, another French mathematician named Èdovard Goursat generalized
the Cauchy theorem by dropping continuity of f 0 .
Theorem 62. (Goursat
H Theorem) Let f be analytic in a simply connected
domain D. Then C f (d)dz = 0 for every simply closed contour C.
It may be observed that neither the Goursat nor the Cauchy theorem
can be extended for multiply connected domains in that form. However, the
following result holds.
FIGURE 8.14: Proof of Cauchy Integral Formula
Theorem 63. Let f (z) be analytic in any domain D1 that contains D and its
boundary curves, where D is a doubly connected domain with outer boundary
C1 and inner boundary C2 . Then
I I
f (z)dz = f (z)dz
C1 C2
where both integrals taken counterclockwise (or both taken clockwise).

Proof. The proof is based on Green’s theorem and the Cauchy-Reimann equa-
tion. Since f 0 is continuous throughout D, the real and imaginary parts of
f (z) = u(x, y) + iv(x, y) and their first derivatives are continuous on D. We
have
I X
f (z)dz = lim (u + iv)(4x + i4y)
C
X X
= lim (u4x − v4y) + i (v4x + u4y)
I I
= u(x, y)dx − v(x, y)dy + i v(x, y)dx + u(x, y)dy
C C
Z Z Z Z
∂v ∂u ∂u ∂v
= (− − )dA + i ( − )dA (8.23)
D ∂x ∂y D ∂x ∂y
by applying Green’s theorem on each line integral. Since f is analytic, we have
∂u ∂v
=
∂x ∂y
and
∂u ∂v
=−
∂y ∂x
that clearly
H imply that both integrals on the right hand side of (8.23) are zero.
Hence C f (z)dz = 0.
Remark 69. It has been

R proved that analyticity implies path independence,
namely the value of C f (z)dz is the same for every C if f is analytic in a
simply connected domain.
Example 277. Evaluate the following integrals:
(a) C ez dz; where C is a simple closed curve.
H
2
(b) C ez dz; where C is a simple closed curve.
H
(c) C (z 3 − 1 + 3i)dz, where C is the unit circle | z |= 1.

H
Solution: (a) Since ez is analytic in a domain containing C, where C is a

simple closed curve, by Goursat’s theorem
I
ez dz = 0.
C
2
(b) Since ez is analytic in the domain containing a simple closed curve
I
2
ez dz = 0.
C
3
(c) z − 1 + 3i is an entire function being a polynomial.| z |= 1 is a simple
closed curve (unit circle). Hence
I
(z 3 − 1 + 3i)dz = 0.
|z|=1
R
Example 278. (a) Evaluate C 2zdz, where C is the curve with initial
point z = −1 and terminal point z = −1 + i.
(b) C is z(t) = 2t3 + i(t4 − 4t3 + 2), −1 ≤ t ≤ 1.
Solution: (a) Since f (z) = 2z is analytic in the entire complex plane, we can
replace the curve C by any convenient curve C1 joining z = −1 and z = −1+i.
Let us choose C1 to be the straight line segment x = −1, 0 ≤ y ≤ 1, shown in
Figure 8.14 and we have z = −1 + iy, dz = idy. Therefore,
Z Z Z 1 Z 1 Z −1+i
2zdz = 2zdz = −2 ydy − 2i dy = −1 − 2i 2zdz = −1 − 2i.
c c1 0 0 −1
(b) The given integral is independent of the path and, therefore

Z Z 2−i
2zdz = 2zdz = [z 2 ]2−i
−2+7i = 48 + 24i.
c −2+7i

I
n 2π if n = −1
(z − z0 ) dz =
C 0 6 −1,
if n =
where C is the circle with center z0 and radius r (Figure 8.15).

Solution: C can be written in the form
z(t) = z0 + r(cos θ + i sin θ) = z0 + reiθ , 0 ≤ θ ≤ 2π.
Then
(z − z0 )n = rn einθ
and
dz = ireiθ dθ.
We get I Z 2π
(z − z0 )n dz = rn einθ reiθ dt
C 0
Z 2π
= irn+1 ei(n+1)θ dθ.
0
If we apply the Euler formula, the right hand side takes the form
Z 2π Z 2π
irn+1 cos(n + 1)θdθ + i sin(n + 1)θdθ .
0 0
For n = −1,we have r = 1, cos 0 = 1, sin 0 = 0. Thus

I Z 2π
(z − z0 )−1 dz = 1 dθ = 2πi.
c 0
FIGURE 8.15: Path (Curve) of Example 279
For n 6= −1 each of the two integrals is zero as we integrate over an interval of

length 2π, equal to a period of sine and cosine. Thus, we obtain the desired
result.
Theorem 64. (Cauchy’s integral formula)
(a) Let f be analytic in a simply connected domain D, and let C be a simple
closed contour in D. If z0 is any point of C, then the value of f at z0 is
given by I
1 f (z)
f (z0 ) = dz. (8.24)
2πi C z − z0
(b) The value of nth derivative of f at z0 is given as
I
n! f (z)
f (n) (z0 ) = dz. (8.25)
2πi C (z − z0 )n+1
(c) (Cauchy’s inequality)

n!M
|f (n) (z0 )| ≤ . (8.26)
rn
where |z − z0 | = r and M is a real number such that |f (z)| ≤ M for all
points z on C.
(d) (Liouville’s theorem) The only bounded entire functions are constants.
(e) (Fundamental theorem of algebra) Let P (z) be a non-constant polyno-
mial, then equation P (z) = 0 has at least one root.
(f ) (Morera’s theorem–Converse of Cauchy’s Theorem) Let f (z) be contin-

uous in a simply connected domain D and if
I
f (z)dz = 0
C
for every closed contour C in D, then f (z) is analytic in D.
Proof. (a) (Cauchy’s integral formula)

We can write
f (z) = f (z0 ) + [f (z) − f (z0 )]
[f (z) − f (z0 )]
I I I
f (z) f (z0 )
dz = dz + dz. (8.27)
C z − z0 C z − z0 C z − z0
The first term of (8.27) is f (z0 )2πi by Example 279. The theorem is proved
if we show that the second term is zero. The integrand in the second term
is analytic except at z0 . Replace C by a circle of radius r and centre z0 , see
Figure 8.16. For ε > 0, we can find a δ > 0 such that |f (z) − f (z0 )| < ε for all
FIGURE 8.16: Proof of Cauchy’s integral formula
z in the disk |z − z0 | < δ (since f (z) is continuous as it is analytic). Choosing

radius r smaller than ρ, we have
f (z) − f (z0 ) ε
| |<
z − z0 ρ
at each point of the circle. The length of the circle is 2πρ. Hence by Theorem
60(e) I
f (z) − f (z0 ) ε
dz < 2πρ = 2πε.

z − z0 ρ

|z−z0 |=r
This implies the second term of (8.27) is zero and the result is proved.
(b) The theorem is proved by the principle of mathematical induction. The

statement is true for n = 1, that is,
I
1 f (z)
f 0 (z0 ) = dz (8.28)
2πi C (z − z0 )2
f (z0 + 4z) − f (z0 )
f 0 (z0 ) = lim (8.29)
4z→0 4z
I I
1 f (z) f (z)
= lim dz − dz
4z→0 2πi4z C z − (z0 + 4z) C z − z0
I
1 1 f (z)
or f 0 (z0 ) = dz. (8.30)
lim4z→0 2πi c (z − z0 − 4z)(z − z0 )
Since f is continuous on C, it is bounded, that is, there exists a positive

constant M such that |f (z)| ≤ M for all z on C. Further, let L be the length
of C and let δ denote the shortest distance between points in C and the point
z0 implying
|z − z0 | ≥ δ
or
1 1
2
≤ 2
|z − z0 | δ
for all points z on C.
δ
If we choose |4z| ≤ , then
2
δ
|z − z0 − 4z| ≥| |z − z0 | − |4z| ≥ δ − |4z| |≥
2
and so
1 2
≤ .
| z − z0 − 4z | δ
Now, I I
f (z) f (z)

2
dz − dz
C (z − z0 ) C (z − z0 − 4z)(z − z0 )

I
−4zf (z) 2M L | 4z |
≤
= 2
dz . (8.31)
C (z − z0 ) (z − z0 − 4z)
δ3
Because the last expression approaches zero as 4z → 0, we get
f (z0 + 4z) − f (z0 )

I
1 f (z)
f 0 (z0 ) = lim = dz (8.32)
4z→0 4z 2πi C (z − z0 )2
Assume that I
(n) n f (z)
f (z0 ) = dz.
2πi C (z − z0 )n+1
Then
I
(n + 1) f (z)
f (n+1) (z0 ) = n+2
dz
2πi C (z − z0 )
I
n f (z) 1
= (n + 1) n+1
dz.
2πi C (z − z0 ) (z − z0 )
(c) From (8.25), we get

I
(n) n! f (z) n 1
|f (z) |= dz ≤ M 2πr
2π C (z − z0 )n+1 2π rn+1
as | f (z) |≤ M on C. Thus we have desired result
nM
| f (n) (z0 ) |≤ .
rn
(d) Suppose an entire function f (z) is bounded, say | f (z) |< k for all z. Using
the inequality of the part (c), we get
k
| f 0 (z0 |≤ .
r
Since f (z) is entire, this time for all r, normally we can choose r as large as
we please so we conclude that f 0 (z) = 0 for all z. This means that f (z) is
constant.
1
(e) Let P (z) 6= 0 for all z. This implies that Q(z) = is an entire function.
P (z)
| Q(z) |→ 0 as | z |→ ∞ implies Q(z) is bounded for all finite z. Thus by
part(d), Q(z) is constant and so P (z) is constant, thus our assumption leads
to contradiction that P (z) is non-constant. Hence P (z) = 0 for atleast one z.
(f ) Let Z
F (z) = f (z 0 )dz 0 .
D
It can be checked that F (z) is analytic and by the part (b) F 0 (z) is analytic
and so f (z) = F 0 (z) is analytic in D. This proves Morera’s theorem.
Example 280. Evaluate the following integrals applying Cauchy’s integral
formula for derivatives and functions:
H z+1
(a) C z4
dz, where C is the circle | z |= 1.
+ 4z 3
H 1 + 2ez
(b) C
dz; | z |= 1.
z
3
H ez
(c) C (z − i)3
dz, where C is closed curve not passing through i.
Solution: (a)
z+1
f (z) =
+ 4z 3 z4
is not analytic at z = 0 and z = −4 and only z = 0 lies in the contour | z |= 1.
f (z) can be written as
z+1
f (z) = .
z+4
z3
We can identify z0 = 0, n = 2 and
z+1
f (z) = .
z+4
By the quotient rule,
−6
f 00 (z) =
(z + 4)3
and so by Cauchy’s integral formula for derivatives we have
I
z+1 2πi 00 3π
4 + 4z 3
dz = f (0) = − i.
c z 2 32
(b) By Cauchy’s integral formula for
f (z) = 1 + 2ez ,
1 + 2ez
I
dz = 2πi(1 + 2e0 ) = 6πi.
c z
(c) If C does not enclose i then this integral is zero by Cauchy’s theorem since
the only point at which
3
ez
f (z) =
(z − i)3
fails to be differentiable is i. Assume that C encloses i because the factor
z − i occurs to the third power in the denominator and use n = 2 in Cauchy’s
3
integral formula for derivatives with f (z) = ez to obtain
3
ez
I
2πi(n)
dz = f (i) = πif 00 (i).
(z − 1)3 2
We have 3
f 0 (z) = 3z 2 ez
and 3 3
f 00 (z) = 6zez + 9z 4 ez
so 3
ez
I
dz = πi[6ie−i + 9e−i ] = (−6 + 9i)πe−i .
c (z − i)3
8.4 Residues and Residue Theorem

8.4.1 Introduction
A function of complex variable f (z) has two kinds of representation in
series of powers of (z − z0 ). If f (z) is differentiable at z0 then expansion of
f (z) will contain only non-negative integer powers of (z − z0 ) and hence is
a power series. If f (z) is not differentiable at z0 then the expansion of f (z)
will contain negative and positive powers of (z − z0 ). See Figure 8.17. The
expansion is called the Laurent expansion or Laurent series. The open
set of points between two concentric circles is called an annulus:
bounding the open annulus r <| z − z0 |< R. Annulus is described by inequal-
FIGURE 8.17: Circles | z − z0 |= r and | z − z0 |= R
ities.
r <| z − z0 |< R,
where r is the radius of the inner circle and R is the radius of the outer circle.
If r = 0 the annulus is a punctured disk (open disk with the center removed).
0 <| z − z0 |< ∞, represents the entire complex plane except z0 and r <|
z − z0 |< R, contains all points outside the inner circle of radius r. The notion
of Laurent series leads to concept of a residue that in turn provides another
way to evaluate complex and real integrals.
Taylor’s theorem states that
∞
X f k (z0 )
f (z) = ak (z − z0 )k ,
k
k=−∞
where f (z) is analytic within a domain D and z0 is a point in D for all z

belonging to the largest circle C with the center at z0 and radius R that lies
entirely within D.
Laurent expansion
Let f be analytic within an annulus D defined by r <| z − z0 |< R. Then f is
represented by the series
∞
X
f (z) = ak (z − z0 )k (8.33)
k=−∞
valid for r <| z − z0 |< R . The coefficients ak are

I
1 f (t)
ak = dt, k = 0, ±1, ±2, . . . (8.34)
2πi C (t − z0 )k+1
where C is a simple closed curve that lies entirely within D and has z0 in its
interior, see Figure 8.18
It may be noted that Laurent expansion is a generalization of Taylor’s
series as a−k = 0 for k = 1, 2, 3, . The Laurent expansion yields Taylor’s
series.
Remark 70. Finding the coefficient ak of a Laurent expansion using (8.34)
is not an easy task. Very often geometric series
1
= 1 + z + z2 + z3 + . . .
1−z
and
1
= 1 − z + z2 − z3 + . . .
1+z
for | z |< 1 are helpful.
8.4.2 Singularities and Residues

If a complex function f (z) is not analytic at z = z0 then z0 is called a singu-
larity or a singular point of f (z). A singularity z0 of f (z) is called an iso-
lated singularity if there exists a punctured disk (open disk), 0 <| z−z0 |< R
of z0 throughout which f (z) is analytic. The Laurent expansion of f (z) can
be written as
∞
X ∞
X
f (z) = a−k (z − z0 )−k + ak (z − z0 )k . (8.35)
k=1 k=0
The first term in (8.35),

∞
X a−k
(z − z0 )k
k=1
is a principal part of the Laurent expansion and will converge for

1 n
(z − z0 ) < r

or for
1
| z − z0 |> = r.
rn
P∞
The second term, k=0 ak (z − z0 )k in (8.35) is called the analytic part of
the Laurent expansion and it converges for | z − z0 |< R. Thus, the sum of
these two parts converges, if r <| z − z0 | and | z − z0 |< R, that is, the
Laurent expansion converges in the annulus: r <| z − z0 |< R.
FIGURE 8.18: Closed Simple Curve Enclosing z0 and Lying in Annulus r <|
z − z0 |< R
Example 281. Examine the nature of singularities of

z
f (z) = .
z2 +9
Solutions: z = 3i and z = −3i are singularities of the function
z
f (z) = .
z2 + 9
Both are isolated singularities since f is analytic at every point in the neigh-
borhood (open disk) 0 <| z − 3i |< 1 and 0 <| z + 3i |< 1.
Classification of isolated singular points
An isolated singularity z = z0 of f (z) is called a removable singularity
if the principal part in the Laurent expansion is zero, that is, all coefficients
a−k in (8.35) are zero. An isolated singularity z = z0 is called a pole if the
principal part in the Laurent expansion contains a finite number of non-zero
terms. It is called a pole of order n if it has n non-zero terms.
If n = 1,that is, a−1 is the only non-zero term then z = z0 is called a simple
pole. z = z0 is called an essential singularity if the principal part has in-
finite non-zero terms. The forms of Laurent series (expansion) about z = z0
are
1. If z = z0 is a removable singularity then the Laurent expansion takes

the form a0 + a1 (z − z0 ) + a2 (z − z0 )2 + . . .
2. If z = z0 is a pole of order n then the Laurent series takes the form
n ∞
X 1 X
a−k + ak (z − z0 )k
(z − z0 )k
k=1 k=0
a−n a−(n−1) a−1

n
+ n−1
+ ··· + + a0 + a1 (z − z0 ) + . . .
(z − z0 ) (z − z0 ) z − z0
3. If z = z0 is an essential singularity then the Laurent series takes the

form
a−2 a−1
··· + 2
+ + a0 + a1 (z − z0 ) + a2 (z − z0 )2 + . . .
(z − z0 ) z − z0
Let us recall that z0 is a zero of a function f if f (z0 ) = 0. An analytic

function f has a zero of order n at z = z0 if f (z0 ) = 0, f 0 (z0 ) = 0, f 00 (z0 ) =
0, . . . f n−1 (z0 ) = 0 but f n (z0 ) 6= 0.
sin z
Example 282. (a) (i) Show that has a removal singularity at z = 0.
z
sin z
(ii) Show that f (z) = 2 has a simple pole at z = 0.
z
(b) Show that z = 0 is a removal singularity of
e2z − 1
f (z) = .
z
(c) Determine the zeros and their order for the functions
(i) f (z) = (z + 2 − i)2
9
(ii) f (z) = z +
z
(iii) f (z) = e2z − ez
Solutions: (a) (i) We know that
z3 z5
sin z = z − + + ...
3! 5!
or
sin z z2 z4
=1− + ...
z 3! 5!
The right hand side is of the form
a0 + a1 (z − z0 ) + a2 (z − z0 ) + · · · · · · for z0 = 0.
sin z
Hence z = 0 is a removal singularity of .
z
(ii) We have
sin z 1 z z3
= − + ...
z2 z 3! 5!
1
Since 0 <| z |, we see that a−1 = 6= 0, and so z = 0 is a simple pole of the
z
sin z
function f (z) = 2 .
z
(b) Since
2z 22 z 2 23 z 3
e2z = 1 + + + + ...,
1! 2! 3!
we get
2 22 2 23 3
2z
e −1 (1 + z + z + z + ..) − 1
= 1 2 3
z z
2 22 23
= + z + z2 + . . .
1 2 3
e2z − 1
z = 0 is a removal singularity of the function f (z) = as it is of the
z
form
a0 + a1 (z − z0 ) + a2 (z − z0 ) + . . .
for z = 0.
(c) (i) f (z) = (z + 2 − i)2 has a zero at z = −2 + i. It is of order 2 as
f 0 (z) = 2(z + 2 − i)
=) = 0 at z = −2 + i
but f 00 (z) 6= 0 for z = −2 + i. Hence z = −2 + i is a zero of order 2.

(ii) We have
z2 + 9 (z + 3i)(z − 3i)
f (z) = =
z z
2z 2 − (z 2 + 9) z2 − 9 9
f 0 (z) = 2
= = 1 − 2.
z z2 z
Since f (3i) = 0 and f (−3i) = 0, z = 3i and z = −3i are zeros of the

given function. The order of each of these zeros is 1 as f 0 (3i) = 2 6= 0 and
f 0 (−3i) 6= 2.
(iii)
f (z) = ez (ez − 1)
and
z = 2πni, n = 0, ±1, ±2, ±3 . . .
are zeroes of f (z). Since
f 0 (z) = 2e2z − ez , f (2πni) = 2e4πni − e2πni = 1 6= 0,
by the definition each of
z = 2πni, n = 0, ±1, ±2, ±3 . . .
is a zero of order 1.
1
Definition 95. (residue) The coefficient a−1 of in the Laurent series
(z − z0 )
expansion of f (z)
∞
X
f (z) = ak (z − z0 )k
k=−∞
a−2 a−1
= ··· + + + a0 + a1 (z − z0 ) + a2 (z − z0 )2 + . . . .
(z − z0 )2 (z − z0 )
where f has removable singularities at the point z0 is called the residue of f
at z0 and is often denoted by Res(f (z), z0 ).
The following theorems provide techniques for computing residues for a
simple pole and pole of order n.
Theorem 65. (a) If f has a simple pole at z = z0 then
Res(f (z), z0 ) = lim (z − z0 )f (z). (8.36)

z→z0
g(z)
(b) If f (z) = where g and h are analytic at z = z0 if g(z0 ) 6= 0 and if
h(z)
the function h has a zero of order 1, at z0 , then f has a simple pole at z = z0
and
g(z0 )
Res(f (z), z0 ) = 0 .
h (z0 )
Theorem 66. If f has a pole of order n at z = z0 , then
1 dn−1
Res(f (z), z0 ) = lim (z − z0 )n f (z). (8.37)
(n − 1) z→z0 dz n−1
Proof. (a) For a simple pole z = z0 the Laurent expansion of f (z) about z0
is given by
a−1
f (z) = + a0 + a1 (z − z0 ) + a2 (z − z0 )2 + · · ·
(z − z0 )
Multiplying both sides by (z − z0 ) and taking the limit
lim (z − z0 )f (z) = lim [a−1 + a0 (z − z0 ) + a1 (z − z0 )2 + a2 (z − z0 )3 + . . . ]

z→z0 z→z0
= Res(f (z), z0 ).
(b) By part (a) we have
lim (z − z0 )f (z) = Res(f (z), z0 )

z→z0
and by the assumption on h(z), we see that h(z) = 0 and
h(z) − h(z0 )
lim = h0 (z0 )
z→z0 z − z0
g(z) g(z) g(z0 )
Res(f (z), z0 ) = lim (z − z0 ) = lim = 0 .
z→z0 h(z) z→z0 h(z) − h(z0 ) h (z0 )
z − z0
Since f has a pole of order n, its Laurent expansion for 0 <| z − z0 |< R is of
the form
a−n a−2 a−1
f (z) = n
+ ··· + 2
+ + a0 + a1 (z − z0 ) + . . . .
(z − z0 ) (z − z0 ) z − z0
Multiplying the last expression by (z−z0 )n and differentiating the result (n−1)
times we get
dn−1
(z − z0 )n f (z) = (n − 1)a−1 + na0 (z − z0 ) + . . . .
dz n−1
By taking the limit as z → z0 of both sides we get
dn−1
lim (z − z0 )n f (z) = (n − 1)a−1
z→z0 dz n−1
as all terms on the right hand side will be zero except the first term. This gives
us
1 dn−1
a−1 = Res(f (z), z0 ) = lim (z − z0 )n f (z).
(n − 1) z→z0 dz n−1
This proves the theorem.
Theorem 67. (Residue theorem) Let f (z) be analytic inside a simple closed
curve C and on C, except for finitely many singularities z1 , z2 , z3 . . . , zn inside
C. Then the integrals f (z) taken counterclockwise around C equal 2πi times
the sum of the residues of f(z) at z1 , z2 , z3 . . . , zn , namely
I n
X
f (z)dz = 2πi Res(f (z), zk ).
C k=1
Proof. We enclose each of the singularities zk in a circle Ck with radius small

enough that those n circles and C are all separated (see Figure 8.19).
Then f (z) is analytic in the multiply connected domain D banded by C and
C1 , C2 , . . . , Cn and the entire boundary of D. By Cauchy’s integral theorem

we get
I I I I
f (z)dz + f (z)dz + f (z)dz + · · · + f (z)dz = 0
C C1 C2 Cn
where the integral on C is taken counterclockwise and other integrals clockwise.

By taking integrals on C1 , C2 , . . . , Cn also counterclockwise we get
I I I I
f (z)dz = f (z)dz + f (z)dz + · · · + f (z)dz. (8.38)
C C1 C2 cn
By the Laurent expansion (8.35) and Cauchy’s integral formula (8.34), residue
f (z) I
1
z = z1 = a−1 = f (z)dz = Res(f (z), z1 )
2πi C1
or I
f (z)dz = 2πiRes(f (z), zk ), k = 1, 2, . . . , n. (8.39)
C
By (8.38) and (8.39), we get the desired result

I n
X
f (z)dz = 2πi Res(f (z), zk ), k = 1, 2, . . . , n. (8.40)
Ck k=1
FIGURE 8.19: Residue Theorem
Remark 71. The evaluation of integrals using the residue theorem depends
on the determination of residues at singular points. The residue theorem
should be used to evaluate integrals of those functions and finding those
residues at singular points is not tedious.
1
Example 283. (a) Find the residue of f (z) = at z = 1.
2(z − 1)2
(b) Find the residue of f (z) = e3/z . at z = 0.

z − 6i
(c) Find the residue of f (z) = at z = −4i.
(z − 2)2 (z + 41)
4iz − 1
(d) Find the residue of f (z) = at z = π.
sin z
Solution: (a) The Laurent expansion of f (z) is
1 1 1 z−1
f (z) = − − − ...
2(z − 1)2 4(z − 1) 8 16
The principal part is

1 1
− ,
2(z − 1)2 4(z − 1)
−1
implying a−1 is 4 . Hence
−1
Res(f (z), 1) = .
4
(b) The Laurent expansion of
3
f (z) = e z
is
3 3 32 33
ez = 1 + + 2 + 3 + ...
z 2z 3z
a−1 = Residue(f (z), 0) = 3.
(c) The given function has a simple pole at z = −4i. By Theorem 65(a)
3 − 6i −4i − 6i 2 3
Res(f (z), −4i) = lim (z + 4i) = lim = = − + i.
z→−4i z→−4i (z − 2)2 (−4i − 2)2 5 10
(d) By Theorem 65(b)

h(z0 )
Res(f, z0 ) =
g 0 (z0 )
where h(z) = 4iz − 1, g(z) = sin z, z0 = π. Hence
4iπ − 1
Res(f, π) = = 1 − 4πi.
cos π
Example 284. (a) Find Res(f (z), 1), where
2
f (z) = .
(z − 1)(z + 4)
(b) Find Res(f (z), 0), where

2
f (z) = e−2/z .
Solution: (a) The Laurent expansion of the given function at z = 1 is
2 1 2 z − 1 (z − 1)2 (z − 1)3
f (z) = = (1 − + − + ...
5(z − 1) z−1 5(z − 1) 5 52 53
1+
5
2 1 2 2(z − 1) 2(z − 1)2
= − + − .
5 (z − 1) 25 53 54
2
Res(f (z), 1) = a−1 = .
5
(b) Laurent expansion of f (z) at z = 0 is
23 22 2
f (z) = · · · − 6
+ 4 − 2 + 1....
3z 2z 1z
implying Res(f (z), 0) = 0.
Example 285. Evaluate by the residue theorem
ez
I
4 3
dz,
C z + 5z
where C is the circle | z |= 2.

Solution: Since z 4 + 5z 3 = z 3 (z + 5), the given function
ez
f (z) =
z 4 + 5z 3
has singularities at z = 0 and z = −5. Only z = 0 lies inside the given circle
so
ez
I I
f (z)dz = 4 3
dz = 2πiRes(f (z), 0)
C C z + 5z
by the residue theorem (Theorem 67). By Theorem 66,
1 d2 e3
Res(f (z), 0) = lim 2 z 3 4
2 z→0 dz z + 5z 3
or
1 (z 2 + 8z + 17)ez 1 17π
Res(f (z), 0) = lim = .
2 z→0 (z + 5)3 2 125
Therefore, I
11 17π 17
f (z)dz = 2π = i.
C 2 125 125
Example 286. Find the contour integral of terms over the circle | z |= 2.
sin z
(i) tan z = .
cos z
1
(ii) .
(z − 1)(z + 2)2
Solution: (i) The integrand
sin z
tan z =
cos z
has simple poles at the points where cos z = 0. The only zeroes for cos z are
real numbers
π
z = (2n + 1) , n = 0, ±1, ±2, ±3, . . . .
2
π π
and − and are in the given circle | z |= 2. Thus
2 2
I
π π
tan zdz = 2πi(Res(f (z), − ) + Res(f (z), )).
|z|=2 2 2
By Theorem 65(b), for h(z) = cos z, g(z) = sin z,and h0 (z) = − sin z,
π
π sin
Res(f (z), ) = 2 = −1
2 π
− sin
2
and
−π
−π sin
Res(f (z), )= 2 = −1.
2 −π
− sin
2
Hence I
tan zdz = 2πi(−1 − 1) = −4πi.
|z|=2
(ii) I
1
dz = 2πi[Res(f (z), 1) + Res(f (z), −2)]
C (z − 1)(z + 2)2
by the residue theorem
1 1
Res(f (z), 1) = , Res(f (z), −2) = − .
9 9
Thus I
1 1 1
dz = 2πi( − ) = 0.
(z − 1)(z + 2)2 9 9
Example 287. Evaluate I
z+1
dz
C z 2 (z + 2)2
using the residue theorem; C in the circle | z |= 3.
I
z+1
Solution: dz = 2πiRes((f (z), 2i) + 2πiRes(f (z), 0)
z 2 (z
+ 2)2
1 1 1 1
= 2πi[(− − i) + ( + i)] = 0.
4 2 4 2
Example 288. Evaluate I
z+1
dz
C z 2 (z − 2i)
by applying the residue theorem over the circle C: | z − 2i |= 1.
Solution: z = 0 and z = 2i are singular points but z = 0 does not lie inside
| z − 2i |. z = 2i is a simple pole.
z+1
Res(f (z), 2i) = L lim (z − 2i)
z→2i z 2 (z
− 2i)
2i + 1 1 1
= =− − i
−4 4 2
I
1 1
f (z)dz = 2πi(− − i)
4 2
1
= π(1 − i).
2
8.5 Application of Residue Theory for Evaluation of

Real Integrals
We use here the residue theory discussed in Section 8.5 to evaluate integrals
of the type
R 2π
(a) 0 F (cos θ, sin θ)dθ
R∞
(b) −∞ f (x)dx
R∞ R∞
(c) −∞ f (x) cos dxdx and −∞ f (x) sin dxdx
8.5.1 Evaluation of Integrals Involving Trigonometric Func-

tions
We want to evaluate integrals of the type
Z 2π
F (cos θ, sin θ)dθ. (8.41)
0
First we convert the formula into a complex integral over the unit circle center
at the origin. The unit circle can be expressed in the form
z − cos θ + i sin θ = eiθ , 0 ≤ θ ≤ 2π.
Applying dz = ieiθ dθ,
eiθ + e−iθ eiθ − e−iθ

cos θ = , sin θ = ,
2 2i
we replace dθ, cos θ and sin θ by
dz 1 1
dθ = , cos θ = (z + z −1 ), sin θ = (z − z −1 ).
iz 2 2i
The integral (8.41) takes the form
Z 2π I
1 1 dz
F (cos θ, sin θ)dθ = F ( (z + z −1 )), (z − z −1 ) .
0 |z|=1 2 2i iz
Finding the real integral is equivalent to evaluating the integral

I
1 1 1
f (z)dz, where f (z) = F ( (z + z −1 ), (z − z −1 )) dz.
|z|=1 2 2i iz
By the residue theorem,

I X
f (z)dz = 2πi Res(f (z), p). (8.42)
|z|=1 p
The sum on the right hand side is taken over all of the poles p of f (z) enclosed
by the unit circle.
Example 289. Evaluate
Z 2π
1
dθ
0 10 − 6 cos θ
using the residue theorem.

Z 2π I
1 1 dz
Solution: dθ = 6 −1
0 10 − 6 cos θ |z| 10 − 2 (z + z ) iz
I
dz 1
= −1 iz
|z|=1 10 − 3z + 3z
I
1 dz
=
i |z|=1 10z − 3z 2 + 3
I
1 2 dz
= − 2 − 10z − 1
2 i |z|=1 3z
I
1 2 dz
= − .
2 i |z|=1 (3z − 1)(z − 3)
1
Note that z = and z = 3 are singular points but z = 3 does not lie inside
3
the unit circle. Therefore by the residue theorem
I
dz 1
= 2πiRes(f (z), )
|z|=1 (3z − 1)(z − 3) 3
1 1
= 2πi lim (z − )
z→1/3 3 (3z − 1)(z − 3)
1 1
= 2πi lim
3 z→1/3 (z − 3)
1 3 π
= 2πi (− ) = − i.
3 8 4
Therefore, Z 2π
1 π
dθ = .
0 10 − 6 cos θ 4
Z π
1
dθ
0 λ + µ cos θ
where 0 < µ < λ.
Z 2π Z π Z 2π
1 1 1
Solution: dθ = dθ+ dθ.
0 λ + µ cos θ 0 λ + µ cos θ π λ + µ cos θ
Let w = 2π − θ in the second integral on the right hand side to obtain
Z 2π Z 0
1 1
dθ = (−1)dw
π λ + µ cos θ λ + µ cos(2π − w)
Zππ
1
= dw.
0 λ + µ cos w
Therefore, Z 2π Z π
1 1
dθ = 2 dθ
0 λ + µ cos θ 0 λ + µ cos θ
Z π
1 2π
Z
1 1
dθ = dθ
0 λ + µ cos θ 2 0 λ + µ cos θ
or Z 2π Z
1
dθ = f (z)dz,
0 λ + µ cos θ |z|=1
where
1
1 2i
f (z) = =− 2 .
β 1 iz µz + 2λz + µ
λ + (z + )
2 2
f (z) has simple poles at
p
−λ ± λ2 − µ2
z= .
µ
Since λ > µ, these numbers are real. Only one of them,
p
−λ + λ2 − µ2
z1 = ,
µ
is enclosed by the unit circle | z |= 1. The other is outside the unit circle and
is irrelevant for our purpose. Then
Z π
1 2π
Z
1 1
dθ = dθ
0 λ + µ cos θ 2 0 λ + µ cos θ
1
= 2πi(Res(f, z1 ))
2
−2i π
= πi =p .
2µz1 + 2λ λ − µ2
2

2π
sin2 θ
Z
dθ.
0 5 + 4 cos θ
Solution: As discussed above,

Z 2π
sin2 θ (z 2 − 1)2
I
1
dθ = − dz
0 5 + 4 cos θ 4i |z=1 z 2 (2z 2 + 5z + 2)
1 1
= − 2πi[Res(f (z), 0) + Res(f (z), − )]
4i 2
π
= .
4
8.5.2 Evaluation of Several Integrals

Z ∞
f (x)dx
−∞
Z ∞ Z R
f (x)dx = lim f (x)dx.
−∞ R→∞ −R
The limit on the

R∞right hand side is called the Cauchy principal value and is
denoted by P.V −∞ f (x)dx.
Z ∞ Z R
P.V. f (z)dz = lim f (z)dz
−∞ R→∞ −R
n
X
= 2πi Res(f (z), zk ).
k=1
Example 292. Evaluate Z ∞

dx
.
0 1 + x4
1
Solution: Since integral is an even function
1 + x4
Z ∞
1 ∞ dx
Z
dx
= .
0 1 + x4 2 −∞ 1 + x4
Furthermore,
1
f (z) =
1 + z4
πi 3πi −3πi −πi
has four singular points z1 = e 4 , z2 = e 4 , z3 = e 4 , z4 = e 4 which
are simple poles. See Figure 8.20.
The first two singular points lie in the upper half plane. From Theorem 65(b)
we find the residues:

1 1
Res(f (z), z1 ) = =
(1 + z 4 )0 z=z1 4z 3 z=z1
1 −3πi 1 πi
= e 4 =− e4.
4 4

1 1
Res(f (z), z2 ) = =
(1 + z 4 )0 z=z2 4z 3 z=z2
1 − 9πi 1 − πi
= e 4 = e 4.
4 4
Here we used eπi = −1 and e−2πi = 1. By Theorem 67,
FIGURE 8.20: Poles
Z ∞
dx 1 πi 1 πi
= 2πi(− e 4 + e− 4 )
−∞ 1 + x4 4 4
2πi πi πi
= − (e 4 − e− 4 )
4
−2πi π π π
= 2i sin = π sin = √
4 4 4 2
as
eiθ − e−iθ π
= sin θ ( Here θ = ).
2i 4
Example 293. Find the principal value of
Z ∞
dx
2 2
.
−∞ (x − 3x + 2)(x + 1)
Solution: We have z 2 − 3x + 2 = (x − 2)(x − 1) and
x2 + 1 = (x − i)(x + i).
The integrand f (x), considered for complex z, has simple poles at

1
z = 1, Res(f (z), 1) = lim (z − 1)f (z) = −
z→1 2
1
z = 2, Res(f (z), 2) = lim (z − 2)f (z) = −
z→2 5
3i
z = i, Res(f (z), i) = limz→i (z − i)f (z) = − .
20
By Theorem 65, z = −i is in the lower half plane, which is not relevant here.
The final result is obtained by applying Theorem 67.
Example 294. Evaluate the Cauchy principal value of the improper integral
Z ∞
x2
2 2
dx.
−∞ (x + 1)
Z ∞
x2 π
Solution: 2 2
dx = 2πiRes(f (z), i) = .
−∞ (x + 1) 2
Fourier integrals: We use the following formulas to evaluate Fourier inte-

grals: Z ∞ X
f (x) cos αxdx = −2π ImRes(f (z)eiαz , p). (8.43)
−∞ p
Z ∞ X
f (x) sin αxdx = −2π ReRes(f (z)eiαz , p). (8.44)
−∞ p
where p represents poles in the upper half plane.

Z ∞
cos αx π
dx = e−4α .
−∞ 16 + x2 4
Z ∞
sin αx
dx = 0, α > 0.
−∞ 16 + x2
eiαz
Solution: has only one pole in the upper half plane, namely a simple
16 + z 2
pole
at z = 4i. By Theorem 65 we obtain
e−4α
iαz iαz
e e
Res , 4i = = .
16 + z 2 2z z=4i 8i
Thus ∞
eiαx e−4α
Z
π
2
dx = 2πi = e−4α
−∞ 16 + x 8i 4
or Z ∞
cos αx + i sin αx π
2
dx = e−4α .
−∞ 16 + x 4
This implies that Z ∞
cos αx π
dx = e−4α
−∞ 16 + x2 4
and Z ∞
sin αx
dx = 0.
−∞ 16 + x2
8.6 Conformal Mappings

8.6.1 Complex Functions as Mappings
A complex function w = f (z) can be interpreted as a mapping of trans-
formation from the z-plane to the w-plane, see Figure 8.21.
rllngeoff
(ll)::·pi~M (b)w·pi~M
FIGURE 8.21: Mapping from z-Plane to w-Plane
Example 296. Find the images of the lines:

(a) Rez = 2 under the mapping f (z) = z 2 .
(b) Rez = 0.
(c) Imz = 0 under the mapping f (z) = z 2 .
Solution: (a) For the function
f (z) = z 2 = (x + iy)2 = x2 − y 2 + 2ixy,
w = u(x, y) + iv(x, y) = x2 − y 2 + 2ixy

and
u(x, y) = x2 − y 2 , v(x, y) = 2xy
by comparing real and imaginary parts.
Rez = 2, u(x, y) = 4 − y 2
and
v(x, y) = 4y.
1
There are parametric equations of a curve in the w-plane. Substituting y = v
4
into the first equation eliminates the parameter y to give
1 2
u=4− v .
16
In other words, the image of the vertical line x = 2 in Figure 8.22(a) is the
parabola shown in Figure 8.22(b)
x:Re:::2
(ll)::·plliM (b)w·plliM
FIGURE 8.22: Image of x = 2 is Parabola
x:Re:::O
(>) (b)
FIGURE 8.23: (a) z−plane (b) w− plane
f (z) = z 2 = x2 − y 2 + 2ixy = −y 2
as Rez = x = 0.
u(x, y) = −y 2 , v(x, y) = 0.
In short the complex function
w = f (z) = u(x, y) + iv(x, y)
may be considered the planar transformation u = u(x, y), v = v(x, y) and

w = f (z) is called the image of z under f .
Remark 72. (a) If z(t) = x(t) + iy(t), describes a curve C in the region, then
w = f (z(t)), a≤t≤b
is a parametric representation of the corresponding curve C in the w-plane.

(b) A point z on the level curve
u(x, y) = a
will be mapped to a point that lies on the vertical line u = a, and a point z
on the level curve
v(x, y) = b
will be mapped to a point w that lies on the horizontal line
v = b.
Examples of complex functions

Example 297. (a) Translation function:
T (z) = w = z + b,
where b is a fixed complex number.

(b) Magnification function: It is a complex function defined as f (z) = αz,
where α is a fixed positive real number.
(c) Exponential function: It is a complex function defined as
f (z) = ez .
(d) g(z) = w = reiθ0 z elementary function is simply a rotation function

through θ0 degrees, for if
z = reiθ0 ,
then
w = g(z) = rei(θ+θ0 ) .
(e) w = az with | a |= 1 is also a rotation function.
(f) f (z) = w = az + b where a and b are real or complex numbers is called a
linear function.
1
(g) w = f (z) = is called an inversion in the unit function. This function
z
has domain z 6= 0 and real and imaginary parts:
w = u(x, y) + iv((x, y)
1 1 x − iy
= =
x + iy x + iy x − iy
x − iy x −y
= = 2 +i 2
x2 − y 2 x − y2 x − y2
x −y
u(x, y) = and v(x, y) = 2 .
x2 − y 2 x + y2
For a 6= 0 the level curve u(x, y) = a can be written as
1
x2 − x + y 2 = 0
a
or
1 2 1
(x − ) + y 2 = ( )2 .
2a 2a
The level curve is therefore a circle with its center on the x-axis and passing
through the origin. A point z on this circle other than zero is mapped to a
point w on the line u = a. Similarly, the level curve v(x, y) = b, b 6= 0 can be
written as
1 1
x2 + (y + )2 = ( )2 ,
2b 2b
and a point z on this circle is mapped to a point w on the line v = b; it is
clear that
1
f −1 (w) =
w
and
1
w = f (z) =
z
or
1
z=
w
−1
and so f = f . Therefore we conclude that it maps the horizontal line y = b
to the circle
1 1
u2 + (v + b2 )2 = ( b)2
2 2
and it also maps the vertical line to x = a to the circle
1 1
(u − a)2 + v 2 = ( a)2 .
2 2
Example 298. (a) Find the image curve in the w-plane of the circle| z |= 1
under the complex function
1
w = f (z) = z + .
z
(b) Find the image region in the w-plane of the rectangle 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
under the complex function
w = f (z) = ez .
8.6.2 Conformal Mappings

Conformal mappings preserve the angles between two curves. Mappings of Ex-
ample 297 are conformal. Let C1 and C2 be two curves in z-plane intersecting
at z = z0 . Angle between C1 and C2 is θ (angle between tangent between those
two curves at z0 ). Let the images of C1 and C2 under w = f (z) be respectively
Γ1 and Γ1 and let the angle between Γ1 and Γ1 be φ. f is conformal mapping
if θ = φ. See Figure 8.24.
y v
0
~ Zo
X 0
'W Wo
FIGURE 8.24: w = f (z) is Conformal if θ = φ
Remark 73. Let z11 and z21 denote tangent vectors to curves C1 and C2 ,
respectively, then
| z10 − z20 |=| z10 |2 + | z20 |2 −2 | z10 || z20 | cos θ
by applying the law of cosines to the triangle determined by z10 and z20 . Thus
| z10 |2 + | z20 |2 − | z10 − z20 |

cos θ =
2 | z10 || z20 |
or
| z10 |2 + | z20 |2 − | z10 − z20 |
θ = cos−1 . (8.45)
2 | z10 || z20 |
Let w10 and w20 denote tangent vectors C10 and C10 respectively, then
| w10 |2 + | w20 |2 − | w10 − w20 |

Φ = cos−1 (8.46)
2 | w10 || w20 |
and w = f (z) is a conformal mapping if angles given in (8.45) and (8.46) are
equal to θ = φ.
The following theorem provides a simple criterion for a complex function
to be conformal mapping.
Theorem 68. A function of complex variable f (z) defined on D is a confor-
mal mapping at z = z0 if f (z) is analytic in the domain D and
f 0 (z0 ) 6= 0.
Proof. Let a curve C in the domain D be parameterized by z = z(t); then

w = f (z(t)) describes the image curve in the w-plane. By the chain rule
applied to w, we get
w0 = f 0 (z(t))z 0 (t).
Since curves C1 and C2 intersect at z0 in D, we have
w10 = f 0 (z0 )z10

and
w20 = f 0 (z0 )z20 .
Since f 0 (z0 ) 6= 0, we obtain
0
| f (z0 )z10 |2 + | f 0 (z0 )z20 |2 − | f 0 (z0 )z10 − f 0 (z0 )z20 |2

Φ = cos−1
2 | f 0 (z0 )z10 || f 0 (z0 )z20 |
0 2 0
| z1 | | f (z0 ) |2 + | z20 || f 0 (z0 ) |2 − | f 0 (z0 ) |2 | z10 − z20 |2

= cos−1
2 | f 0 (z0 ) || z10 | f 0 (z0 ) || z20 |
0 2
| z1 | + | z20 |2 − | z10 − z20 |2

= cos−1 = θ.
2 | z1 | z2 |
Therefore Φ = θ and f is conformal at z = z0 .

Example 299. Examine whether the following complex mappings are con-
formal
(a) w = f (z) = sin z
(b) w = f (z) = cos z
(c) w = f (z) = 1 + ez + z
(d) w = f (z) = z 3 − 3z + 1
1
(e) w = f (z) = z +
z
8.6.3 Möbius Transforms

In this subsection we present a class of conformal transform, called the
Möbius transform or linear fractional transform, which provides very
efficient methods of finding one-to-one mappings of domains into one another.
Definition 96. For a, b, c, d real or complex numbers such that | a+ | c |>

0, ad − bc 6= 0, the transformation defined as
az + b
w = T (z) = (8.47)
cz + d
is called a Möbius or linear fractional transform.
a b
Remark 74. (a) For c = 0, w = T (z) = z + ; Möbius transform is linear.
d d
(b) If c 6= 0 and a = 0 then the Möbius transform
b
T (z) =
cz + d
is an inversion.
(c) In case ac 6= 0, we can write

a bc − ad
w = T (z) = 1+
c a(cs + d)
and so the Möbius transform is a decomposition of a linear transformation
and inversion of mappping of (8.47) is obtained by solving for z, namely:
cwz + wd = az + b
z(cw − a) = b − wd
or
b − wd dw − b
z= =
wc − a −cw + a
a a
when c 6= 0, then −cw + a = 0 for w = , and let be the image of z = ∞.
c c
The complex together with the point ∞ is called the extended complex plane.
(d) Since
a(cz + d)
T (z) = (cz + d)2
−(az + b)c
ad − ac −d
= 6= 0, for z 6= .
(cz + d)2 c
Every Möbius transform is a conformal mapping. We can summarize the above
properties of the Möbius transform in the form of theorems.
Fixed points of complex mappings
Fixed points of a complex mapping w = f (z) are mapped, that is, w = f (z) =
z.
(a) The identity mappingI(z) = z = w has every point as a fixed point.

(b) The mapping w = f (z) = z has infinitely many fixed points.
1
(c) The mapping w = f (z) = has two fixed points.
z
(d) A rotation mapping has one fixed point.
(e) A translation mapping has no fixed point in the complex plane.
Existence of fixed points for complex mappings can be described by the fol-
lowing theorems.
Theorem 69. A Möbius transform different from the identity has at most two
fixed points. Any Möbius transform having three or more fixed points must be
the identity mapping.
Theorem 70. Let T be a Möbius transform. Then

(a) T can be written as a composition of magnifications, rotations, transla-
tions and inversions.
(b) T maps the extended complex plane onto itself. It is a one-to-one map-
ping.
(c) T maps the class of circles and lines into itself.
(d) T is a conformal map at every point except its pole.
The following theorem (see proof in Appendix D if desired) describes how to
find Möbius transformations by three different points.
Theorem 71. Three given distinct points z1 , z2 , z3 can always be mapped
onto three prescribed distinct points w1 , w2 , w3 by only one Möbius transform
w = f (z) given implicitly by the equation
w − w1 w2 − w3 z − z1 z2 − z3
= .
w − w3 w2 − w1 z − z3 z2 − z1
Example 300. Construct a Möbius transform that maps points 1, 0 and −1
on the circle |z| = 1 x-axis to the points. −1, 0, and 1 on the real axis.
Example 301. (a) Find all Möbius transforms whose fixed points are −1 and
1.
(b) Find all Möbius transformations with no fixed points in the complex plane.
8.7 Applications
In this section we briefly indicate how the results of complex analysis can
be used in areas such as Electrostatics potential, Heat Flow, Fluid Flow and
Tomography.
8.7.1 Electrostatics Potential

We have discussed earlier solutions of Laplace equation under different
initial and boundary conditions. The equation models gravitational fields,
electrostatic fields, heat conduction, compressible fluid flows and problems
of many other fields. The theory of solving the Laplace equation
∂2φ ∂2φ
∇2 φ = + 2 =0 (8.48)
∂x2 ∂y
is called potential theory, and solutions having continuous second partial

derivatives are called harmonic functions.
Equation (8.48) can be solved by complex analysis because its solutions are
closely related to analytic complex functions as seen in Section 8.2. Transitions
from real to complex have the advantage that by the complex potential
F = φ + iψ
one can simultaneously handle equipotential lines φ = constant and their or-
thogonal trajectories, namely the lines or flow ψ = constant.
Furthermore, in solving the Dirichlet problem of finding a potential with given
boundary values the conformal mapping can be used. The electrical force of at-
traction or repulsion between charged particles is governed by Coulomb’s law.
This force is the gradient of a function φ called the electrostatic potential.
At any points free of charges φ in a solution of Laplace equation
∇2 φ = 0.
The surfaces φ(x, y) = constant are called equipotential surfaces. At each point
P the gradient of φ is perpendicular to the surface φ = constant through P ;
that is, the electrical force has the direction perpendicular to the equipoten-
tial surface. We consider here two-dimensional problems, that is, we consider
physical systems that lie in three-dimensional space, but are such that the po-
tential φ is independent of one of the space coordinates, so that φ depends
only on two coordinates, which are x and y and equipotential surfaces appear
as equipotential lines (curves) in the xy plane.
Complex potential: Let φ(x, y) be harmonic in some domain D and ψ(x, y)
a conjugate harmonic of φ in D. Then
F (z) = φ(x, y) + iψ(x, y)
is an analytic function of z = x + iy. This function F is called the complex

potential corresponding to the real potential φ. The use of F has two advan-
tages, one technical and the other physical. Technically, F is easier to handle
than real or imaginary parts. Physically, ψ has a meaning. By conformality,
the curves φ = constant intersect the equipotential lines φ = constant in the
xy-plane at right angles except where F 0 (z) = 0. Hence they have the direc-
tion of the electrical force and, therefore are called lines of force. They are the
paths of moving charged particles (electrons in an electron microscope). As
we see complex potentials relate potential theory closely to complex analysis.
Conformal mapping is used to map a complicated domain into a simpler

one where solution of the Laplace equation is known or can be found more
easily. This solution is then mapped back to the given domain. We obtain
these results due to the fact that harmonic functions remain harmonic under
conformal mapping.
Theorem 72. Let φ∗ be harmonic in a domain D∗ in the w-plane. Suppose

that
w = u + iv = f (z)
is analytic in a domain D in the z-plane and maps conformally onto D∗ . Then
the function
φ(x, y) = φ∗ (u(x, y), v(x, y)) (8.49)
is harmonic in D.
Proof. We know that the composition of analytic function is analytic (the

chain rule). Thus by taking a harmonic conjugate ψ ∗ (x, y) of φ∗ and forming
the analytic function
F ∗ = φ∗ (v, u) + iψ ∗ (u, v),
we find that
F (z) = F ∗ (f (z))
is analytic in D, Hence its real part, φ(x, y) = ReF (z) is harmonic in D.
8.7.2 Heat Flow

Laplace’s equation also models heat flow problems that are steady, that is,
time independent. The heat equation
∂T ∂2T ∂2T
= α2 ( 2 + ) (8.50)
∂t ∂x ∂y 2
represents heat conduction in a body of homogeneous material, where T de-
notes temperature, t stands for time and α2 is a positive constant depending
on the material of the body. Therefore, if a problem is steady, so that,
∂T
= 0,
∂t
the heat equation reduces to Laplace’s equation.
∂2T ∂2T
∇2 T = + = 0. (8.51)
∂x2 ∂y 2
T (x, y) is called the heat potential. It is the real part of the complex heat
potential.
F (z) = T (x, y) + iψ(x, y) (8.52)
The curves T (x, y) = constant are called isotherms (lines of constant tem-
perature) and the curves ψ(x, y) = constant heat flow lines, because heat flows
from higher to lower temperatures along them. Discussions in Section 8.7.1
can be reinterpreted as problems on heat flow.
It may be observed that to have a steady flow, the boundary of the domain of
the heat flow must be kept at constant temperature by heating or cooling. Use-
fulness of conformal mappings and complex potentials may be demonstrated
in study of temperature distribution in a region comprising a cross section of
a solid cylinder and heat conduction in the upper half plane; for details see
for example Section 16.3 of [7].
8.7.3 Fluid Flow

Laplace’s equation plays an important role in hydrodynamics and in steady
nonviscous fluid flow under certain physical conditions. We discuss here the
relevance of complex functions and their integrals by modeling and analyzing
fluid flow. Let the velocity vector V , representing the motion of the fluid, be
written as
V = V1 + iV2 . (8.53)
The magnitude V is denoted by | V | and direction by
V2
ArgV =
V1
at each point z = x + iy. Here V1 and V2 are components of V in the direction
of the x and y axes respectively.
Yo - - - - - - - - - -
"
FIGURE 8.25: Streamline

V is tangential to the path of the moving particles, called a streamline

(Figure 8.25) of the motion,
F (z) = φ(x, y) + iψ(x, y)
is called the complex potential of the flow, and ψ(x, y) = constant are the
streamlines. The velocity V is given by
V = V1 + iV2 = F 0 (z).
ψ is called the stream function and φ is called the velocity potential. The
curves φ(x, y) = constant are called equipotential lines. V is the gradient
φ,
∂φ ∂φ
V1 = , V2 = .
∂x ∂y
In fact, for F = φ + iψ,
∂φ ∂ψ
F0 = +i
∂x ∂x
with
∂ψ ∂φ
=−
∂x ∂y
by the Cauchy-Riemann equation. These together yield
∂φ ∂ψ ∂ψ ∂φ
F 0 (z) = −i = +i = V1 + iV2 = V.
∂x ∂x ∂x ∂y
Let us consider any smooth curve C in the z-plane, given by z(z) = x(s)+iy(s),
where s is the arc length of C. Let
∂V
∂t
denote
R the component of V tangent to C, see Figure 8.26.
V
C t
ds is called the circulation of the fluid along C.
∂V2 ∂V1
−
∂x ∂y
is called velocity of the flow and
1 ∂V2 ∂V1
( − ) = w(x, y)
2 ∂x ∂y
is called the rotation. A fluid is called irrotational if w(x, y) = 0 throughout

the flow. In other words a fluid is irrotational if
∂V2 ∂V2
− = 0.
∂x ∂y
a
v
FIGURE 8.26: Tangential Component of Velocity with Respect to Curve C
If C is taken as a circle of radius r then the mean velocity of the fluid along
C is equal to
∂V ∂V1
( − )
1 ∂x ∂y
.
2 2πr
A fluid is called incompressible if
∂V1 ∂V2
+ =0
∂x ∂y
in region that is free of sources or sinks V that is, points at which fluid is
produced or disappears.
v(x, y) = u(x, y) + iv(x, y)

I
= −vdx + udy
c
is called the flux if it is non-zero. It is called solenoidal if this integral is zero
for all closed curves.
The following theorem describes the connection between the velocity field of
a fluid and complex functions.
Theorem 73. Let u and v be continuous with continuous first and second
partial derivatives in a simply connected domain D.
Let ui + vj be irrotational and solenoidal in D. Then u and −v satisfy the
Cauchy-Riemann equation in D, and
f (z) = u(x, y) − iv(x, y)
is a differentiable complex function on D.

Conversely, if u and −v satisfy Cauchy-Riemann equations on D, then ui+vj
defines an irrotational, solenoidal flow on D.
Remark 75. Theorem 73 provides insight into irrotational and solenoidal

flows. For irrotational flow
curl(ui + vj) = 0
and
div(ui + vj) = 0
for solenoid a flow.
The following theorem implies that any differentiable function
f (z) = φ(x, y) + iψ(x, y)
defined on a simply connected domain determines an irrotational solenoidal

flow.
Theorem 74. Let f be a differentiable function defined on a domain D. Then
f 0 (z) is an irrotational solenoidal flow on D. Conversely, if
v(x, y) = u(x, y)i + v(x, y)j
is an irrotational solenoidal vector field on a simply connected domain D,

then there is a differentiable complex function and defined on D such that
f 0 (z) = V .
Further, if
f (z) = φ(x, y) + iψ(x, y)
then
∂φ ∂ψ ∂ψ ∂ψ
= u, = v, = −v and = u.
∂x ∂y ∂x ∂y
8.7.4 Tomography
This section discusses the role of complex analysis in tomography. The word
tomography comes from the Greek word for slice and the discipline deals with
finding internal structures of non-transparent objects by transmitting signals
such as electromagnetic waves at various frequencies. Examples of such fre-
quencies are radio transmissions, microwaves, light waves in the visual spec-
trum, x-rays, and acoustic waves. Tomography deals with inverse problems
in which the unknown parameters of a system are estimated from its known
reactions to specific signals.
Austrian mathematician Johann Radon studied the integral transform and
inversion of an integral of a real valued function over straight lines in 1917. The
technique became known as the Radon transform and it plays an important
role in tomography, specifically for computed axial tomography (CAT) scans,
barcode scanners, electron microscopy of structures like viruses and proteins,
reflection seismology, and other disciplines that involve applications of partial

differential equations.
Roger Penrose introduced a more complex version of the Radon trans-
form in 1967. The Penrose transform relates massless fields in space time to
cosmology of sheaves in complex projective spaces and is a significant aspect
of the Twistor theory. The inspiration for the Penrose transform was a 1904
paper by Harry Bateman in which a harmonic function of four variables was
represented as a contour integral of a holomorphic function of three complex
variables, namely:
I
φ(w, x, y, z) = f [w + ix + (iy + z)ξ, iy − z + (w − ix)ξ, ξ]dξ,
Details may be found in a book titled Introduction to the Penrose Transform

by Baston and Eastwood [12]. Gindikin [13] studied the construction of an
explicit inversion formula for the Penrose transform focusing on connections
to the Radon transform and multi-dimensional residues.
8.8 Exercises
8. 1 Write the following numbers in the form x + iy:
(a) (5 − 9i) + (2 − 4i)
(b) (1 − i)3
10 − 5i
(c)
6 + 2i
(4 + 5i) + 2i3
(d)
(2 + i)2
8.2. Let z = x + iy. Find the indicated expressions:
(a) Rez(z 2 )
(b) Im(z 2 + z 2 )
(c) | z + 5z |
8.3. Write the following complex numbers in the polar form:

12
(a) √
3+i
(b) (5 − 5i)
√
(c) −2 − 2 3i
8.4. Write the polar form in the form x + iy:

√ 11 11
(a) z = 8 2(cos π + i sin π)
4 4
π π
(b) z = 6(cos + i sin )
8 8
π π
(c) z = 10(cos + i sin )
5 5
z1
8.5. (a) Find z1 z2 and if
z2
π π
z1 = 2(cos + i sin ),
8 8
3π π
z2 Z = 4(cos + i sin ).
8 8
√ 1 √ 1
(b) Find (−1 + 3i) 2 and (−1 − 3i) 4 .
8.6. If z1 = −1 and z2 = 5i, verify
arg(z1 z2 ) = arg(z1 ) + arg(z2 ),

z1
arg( ) = arg(z1 ) − arg(z2 ).
z2
8.7. Sketch the graphs of the following functions

(a) Re(z) = 4
(b) Im(z + 3i) = 6
(c) | z + 2 + 2i |= 2
8.8. Find the images of the given lines under the mapping f (z) = z 2 :
(a) y = 2
(b) y = x
8.9. Express the following functions in the form u(x, y) + iv(x, y):
(a) f (z) = 7z − 9iz − 3 + 2i
(b) f (z) = z 3 − 4z
(c) f (z) = z 4
z
(d) f (z) =
z+1
8.10. Evaluate the functions at the indicated points:
(a) f (z) = 2x − y 2 + i(xy 3 − 2x2 + 1)
(i) 2i (ii) 2 − i (iii) 5 + 3i
(b) f (z) = 4z + iz + Re(z)
(i) 4 − 6i (ii) −5 + 12i
(c) f (z) = ex cos y + iex sin y

i i
(i) π (ii) 3 + π
4 3
8.11. Find
5z 2 − 2z + 2
(a) limz→1+i
z+1
z 2 − 2z + 2
(b) limz→1+i
z 2 − 2i
8.12. Show that the following limits do not exist
z
(a) limz→0
z
x+y−1
(b) limz→1
z−1
8.13. Using the definition of differentiability to show that:
(a) f 0 (z) = 2z if f (z) = z 2
1 1
(b) f 0 (z) = − 2 if f (z) =
z z
8.14. Using rules of differentiation for complex functions find f 0 (z) if:
(a) f (z) = (2z + 1)(z 2 − 4z + 8i)
(b) f (z) = (z 2 − 4i)3
(c) f (z) = (z 2 − 4i)3
3z − 4 + 8i
(d) f (z) =
2z + i
5z 2 − z
(e) f (z) = 3
z +1
8.15. Find points at which the following functions are not differentiable:
z
(a) f (z) =
z − 3i
z3 + 3
(b) 2
z +4
8.16. State and prove Cauchy-Riemann equations.
8.17. State a criterion for analyticity of a function of complex variable.
8.18. Prove that u(x, y) and v(x, y) are harmonic if f (z) = u(x, y) + iv(x, y)
is analytic in a domain D.
8.19. Show that the following functions are not analytic at any point:
(a) f (z) = y + ix
x y
(b) f (x) = +i 2
x2 +y 2 x + y2
(c) f (z) = z
8.20. Show that the following functions are analytic in an appropriate domain:
(a) f (z) = ex cos y + iex sin y
(b) f (z) = 4x2 + 5x − 4y 2 + 9 + i(8xy + 5y − 1)
8.21. Verify that the following functions are harmonic.
(a) u(x, y) = 2x − 2xy
(b) u(x, y) = 4xy 3 − 4x3 y + x
8.22. Evaluate the given integrals along the indicated contours.
1
| z |2 dz, where C is x = t2 , y =
R
(a) C
,1 ≤ t ≤ 2
t
ez dz, where C is the polygonal path consisting of the line segments
R
(b) C
from z = 0toz = 2 and from z = 2to z = 1 + πi.
8.23. State and prove the Cauchy-Goursat theorem.
8.24. Show that I
f (z)dz = 0,
c
where f is given by f (z) = z 3 − 1 + 3i and C is the unit circle| z |= 1.

8.25. State and prove Cauchy’s integral formula.
8.26. Evaluate
4
H
(a)c z − 3i dz, | z |= 5
H 1 + 2ez
(b) c dz, | z |= 1
z
8.27. State Laurent’s theorem.
8.28. Expand the given function in a Laurent series valid for the indicated
annular domains:
z − sin z
(a) f (z) = , 0 <| z |
z5
1
(b) f (z) = , 0 <| z − 3 |< 3
z(z − 3)
8.29. Determine the orders of the poles for the given functions:
3z − 1
(a) f (z) =
z 2 + 2z + 5
z−1
(b) f (z) =
(z + 1)(z 3 + 1)
8.30. Find
4z − 6
(a) Res(f (z), 0) if f (z) = .
z(2 − z)
e−z
(b) Res(f (z), 2) if f (z) = .
(z − 2)2
8.31. Find the residue at each pole of the given function
z
(a) f (z) =
z 2 + 16
2z − 1
(b) f (z) =
(z − 1)4 (z + 3)
8.32. Use Cauchy’s residue theorem to evaluate the following integrals along
the indicated contours:
H 1
(a) c z 2 + 4z + 13 dz, C :| z − 3i |= 3
H 1 3
(b) c 3 4
dz, C :| z − 2 |=
z (z − 1) 2
8.33. Evaluate the following integrals:
R 2π 1
(a) 0
dθ
1 + 3 cos2 θ
R 2π sin2 θ
(b) 0 5 + 4 cos θ
dθ
R∞ 1
(c) −∞ 1 + x2
dx
R ∞ cos x
(d) −∞ 1 + x2
dx
R ∞ sin x
(e) −∞ x
dx
8.34. Show that f (z) = ez is a conformal mapping for all z.

8.35. Determine whether the given mappings are conformal.
(a) f (z) = z 3 − 3z + 1
1
(b) f (z) = (z 2 − 1) 2
1 1
8.36. Show that the mapping w = (z + ) maps the circle | z |= r onto an
2 z
ellipse with foci 1 and −1 in the w-plane.
8.37. Find the image of the given circle =(z) = 2 under the linear fractional
transformation
2z + i
w= .
2−i
8.38. Prove Theorem 65.
8.39. Prove Theorem 66.
8.40. Analyze the flow given by the complex potential f (z) = αz where α is
a non-zero complex constant.
8.41. Solve Example 298.
2π
8.45. Find the temperature T in the sector 0 ≤ arg z ≤ , | z |≤ 1 if T =
√ 3
10◦ C on the x-axis, 80◦ C on y = − 3x, and the curved portion is
insulated.
8.46. Find the complex potential of a parallel flow in the upward direction of
y = x.

Applications of complex analyses to image processing, Radon transformations,
inverse problems, wavelet and shape analyses, free boundary problems and
vertex dynamics potential theories are popular areas for further research. The
Penrose transform, a major component of Twistor theory, is a complex ana-
logue of Radon transformation. The importance of complex analysis can be
seen in studies of electrical independence tomography [2], [9].
Applications of complex wavelets to image processing are discussed in Ref-
erences [4] and [5]. Real wavelets have been applied to Radon transformations
but complex wavelets have not yet been applied to complex analogues of
Radon transforms.
The Penrose transform is a fascinating area for study by interested readers.
Gindikin’s 2014 article cited in Section 8.7.4 provides updated information
about the transform. Kuchment’s monograph [8] provides interesting data
and several references on tomography. Krantz [6] describes applications of
harmonic analysis to complex function analysis. Huggett’s text [3] introduces
Twistor theory and describes its relationship to complex theory. Interesting

discussions on the themes throughout this chapter appear in References [1],
[8], [10], and [11].
Bibliography
[1] R. V. Churchill,Complex Variables and Applications, McGraw Hill, 2009.

[2] M. Hanke, N. Hyvonen and S. Revesing, An Inverse Back Scatter Problem
for Electrical Impedance Tomography, 2008.
[3] S. A. Huggett, An Introduction to Twistor Theory, Cambridge University
Press, 1994, on line publication, 2010.
[4] N. G. Kingsburg, Image processing with complex wavelets, Phil. Trans.
Royal Society, London,1999.
[5] N. G. Kingsburg, Complex wavelets for shift invariant analysis and fil-
tering signals, J. Appl. and Harm. Anal., 10(3): 234-253, 2001.
[6] S. G. Krantz, Explorations in Harmonic Analysis with Application to
Complex Function Theory and the Heisenberg Groups, Birkhauser, Berlin,
2009.
[7] E. Kreysszig, Advanced Engineering Mathematics, 10th Edition, Wiley,
2011.
[8] P. Kuchment, The Radon Transform and Medical Imaging, Vol. 85, Re-
gional Conference Series in Applied Mathematics (CBMS-NSF), SIAM,
2014.
[9] K. Paridis and W. R. B. Lionheart, Shape Corrections for 3DEIT,
MIMS ePrint. Manchester Institute of Mathematical Science, University
of Manchester UK, 2010.
[10] A. D. Wunsch, Complex Variable with Applications, Pearson Education,
2005.
[11] D. G. Zill and P. D. Shanahan, A First Course in Complex Analysis with
Applications, Jones and Bartlett, 2003.
[12] R. Baston and M. Eastwood, Introduction to the Penrose transform,
Dover Publications, New York, 2016 (reprint).
[13] S. Gindikin, Inversion of Penrose transform and Cauchy-Fantappie for-
mula, J. Geom. Phys., 78:127-131, 2014.
597
Chapter 9
Inverse Problems
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

9.2 Inverse Problems in Pre-calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
9.2.1 Inverse Problem in Torricelli’s Law . . . . . . . . . . . . . . . . . . . . . 601
9.2.2 Inverse Problem in Projectile Motion . . . . . . . . . . . . . . . . . . . 601
9.2.3 Inverse Problem in Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
9.2.4 Inverse Problem of Location and Identification . . . . . . . . . 602
9.2.5 Inverse Problem Related to eigenvalues . . . . . . . . . . . . . . . . . 603
9.3 Inverse Problems in Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
9.3.1 Inverse Problem in Draining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
9.3.2 An Inverse Problem in Hanging Cable Models . . . . . . . . . . 604
9.3.3 Inverse Problems in Study of Trajectories . . . . . . . . . . . . . . 604
9.4 Inverse Problems in Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
9.4.1 Inverse Causation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
9.4.2 Identification Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
9.4.3 Inverse Problem for Eigenvalues and Eigenvectors . . . . . . 607
9.4.4 Least Squares Solutions to Inverse Problems . . . . . . . . . . . 607
9.5 Inverse Problems in Differential equations . . . . . . . . . . . . . . . . . . . . . . . 608
9.5.1 Inverse Problems for Mixing Problems . . . . . . . . . . . . . . . . . . 608
9.5.2 Inverse Problem in Newton’s Law of Falling . . . . . . . . . . . . 609
9.5.3 Inverse Problem in Newton’s Cooling Law . . . . . . . . . . . . . . 610
9.5.4 Inverse Problem in Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
9.5.5 Inverse Problem in Carbon dating . . . . . . . . . . . . . . . . . . . . . . 611
9.5.6 Inverse Problem in Population Growth . . . . . . . . . . . . . . . . 612
9.6 Inverse Problem in Partial Differential Equations . . . . . . . . . . . . . . . 612
9.6.1 Inverse Problem for Heat Equation . . . . . . . . . . . . . . . . . . . . . 612
9.6.2 Inverse Scattering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
9.6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
9.6.4 Inverse Problem for Wave Equation . . . . . . . . . . . . . . . . . . . . . 614
9.6.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
9.6.6 Financial Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
9.7 Inverse Problems of Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
9.7.1 Fundamental Steps in Digital Image Processing . . . . . . . . 617
9.7.2 Introduction to Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . 618
9.7.3 Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
9.8 Seismic Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
9.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
599
9.1 Introduction
In this chapter we explain the concept of inverse problems through concrete
examples. A mathematical model enables us to understand a mathematical
representation of a person, a building, a vehicle, a tree or a process such as
weather pattern, traffic flow or air flow over a wing of an aeroplane. Models
are created from a mass of data, equations and computations that mimic the
actions of objects or processes presented.
Models usually include graphic displays that translate number crunching
into animations that one can see on a computer screen or other visual device.
Models can be simple images they can be complex, carrying all the character-
istics of the object or process they represent. A complex model will simulate
the action and reaction of the real world phenomenon. To make these models
behave as they would be in real life, accurate, real time simulations require
fast computers with a lot of number crunching power. Generally modeling is
the process of representing a real world object or phenomenon by a set of
mathematical equations. More specifically, the term is often used to describe
the process of representing three-dimensional objects on a computer.
For proper understanding of any real world system, a mathematical for-
mulation (mathematical modeling) of that system is essential. Usually mathe-
matical formulation is an ordinary differential equation or a partial differential
equation or in the form of an optimization problem (finding maxima or min-
ima of a function) or matrix equation. Finding solutions of these equations
(models) for given parameters and boundary or/and initial conditions is the
response to a direct problem. Finding parameter or initial conditions or
boundary conditions of a model with given solution is known as an inverse
problem.
There exists vast literature on direct problems. But in spite of the impor-
tance of inverse problems, less attention has been paid to the topic in many
parts of the world. The main objective of this chapter is to illustrate the con-
cepts of inverse problems with the help of concrete examples from pre-calculus,
calculus, matrix analysis, differential equations, partial differential equations,
image processing and medical imaging.
Inverse Problems 601
9.2 Inverse Problems in Pre-calculus

9.2.1 Inverse Problem in Torricelli’s Law
h v
R
x
FIGURE 9.1: Tank Depicting Torricelli’s Law
To illustrate an inverse problem, imagine filling a tank with water and

then drilling a hole in its side. Water squirts from the hole, follows an arc and
splashes down some distance from the tank we will designate R. The relation-
ship between horizontal velocity v, height of the hole h, the height of column
of water above the hole (hydraulic head) D − h discovered by Evangelista
Torricelli (1608-1647) a contemporary of Galileo, is known as Torricelli’s law
given by p
v = 2g(D − h)
where D is the depth of the water in the tank, h is height of the hole above
the bottom of the tank and g is the gravitational force.
The direct problem for Torricelli’s law is to determine the range of the
spurt R (see Figure 9.1), given D and h. A simple inverse problem is to find
the height h of the hole based on R.
Example 302. Find height h of the hole if the horizontal velocity v with
which water leaves the hole and D, the depth of water in the tank, are known.
9.2.2 Inverse Problem in Projectile Motion

Suppose a point projectile of unit mass is launched from the origin O at
an angle θ to the positive x-axis with initial speed v (see Figure 9.2). OP is
vy
θ
O vx P
FIGURE 9.2: Projectile Motion
the horizontal range. The direct problem is to find OP if θ is given while the
inverse problem is to find θ given OP (horizontal range).
Example 303. State the physical law that governs the relationship between
the depth of the wall and the time it takes a dropped stone to reach the
bottom of a well, neglecting air resistance.
9.2.3 Inverse Problem in Scattering

A certain type of signal is transmitted to strike an object called the scat-
terer and is bounced off the object or scattered. The scattered signal, which
has been affected by the scatterer is then collected and the characteristics of
the scatterer are inferred from the information contained in the scattered sig-
nal. Well known applications of this idea include radar, sonar and ultrasonic
medical imaging. Consider the following simple scattering problem posed by
Sir Isaac Newton, “Find the depth of a well from the sound of a stone falling
to the well and striking the bottom.” The direct problem is determination
of the time at which the echo is heard and the inverse problem is to find a
physical characteristic of the well, its depth, from one aspect of the reflected
signal, the echo time.
Example 304. State the physical law that governs the relationship between
the depth of well and the time it takes the dropped stone to reach the bottom,
neglecting air resistance.
9.2.4 Inverse Problem of Location and Identification

We consider here a simplified model of reflection seismology. Suppose an
explosive charge is detonated at some point on the flat surface of the earth.
The direct problem is to find the propagation time T for given velocity v,
the depth d of the stratum, and location X of the receiver. The determination
of v and d from X and T is an interesting inverse problem. For example,
a reflected seismic signal is heard at a geophone 300 meters away from the
source 3 seconds after detonation. What is the slowest possible velocity of

propagation, irrespective of the depth of the stratum?
9.2.5 Inverse Problem Related to eigenvalues

Given a real m × m symmetric matrix P and m real numbers
λ1 , λ2 , . . . . . . λm , finding a diagonal matrix D such that P + D has eigenvalues
λ1 , λ2 , . . . . . . λm . This problem is inverse to the direct problem of computing
the eigenvalues of the given matrix P + D.
9.3 Inverse Problems in Calculus

9.3.1 Inverse Problem in Draining
Let a vessel be formed by revolving a curve x = f (y) about y-axis. We can
fill this vessel to various depths with water and allow water to flow out under
the influence of gravity through an orifice of cross sectional area a at the base
of the vessel (Figure 9.3). The depth y of water in the vessel can be observed,
say, by means of a pilot tube of negligible volume. The time T (y) for a given
depth y that it takes for the vessel to drain completely can be measured with
a stop watch. The drain time function depends on the shape of the vessel.
Applying Torricelli’s law, we can derive that
T T
π(f (y))2 dy (f (u))2
Z Z
π
T (y) = − √ dt = √ √ du. (9.1)
0 a 2gy dt a 2g 0 u
This equation provides relationship between the drain time function T and
(f (u))2
the shape function f provided f is continuous and √ is integrable on
u
[0, T ]. Computing the drain time function T for given f is known as the direct
problem. The inverse problem consists of determining the shape f if the drain
time T is given.
Example 305. A classical inverse problem in an ancient technology was

design of a clepsydra or water clock. One design has depth y change at a
constant rate with time. What is the shape function of such a clepsydra?
Example 306. Show that weight distribution is constant if shape is a
parabola.
FIGURE 9.3: Influence of Gravity on Liquid Level
9.3.2 An Inverse Problem in Hanging Cable Models

Let us consider a cable that supports a horizontally distributed load, for
example a suspension bridge (see Figure 9.4).
FIGURE 9.4: Loaded Cable
We neglect the weight of the cable. Assume that weight of the load is
distributed nonuniformly along the interval [−1, 1] of the x-axis. LetRus assume
x
that cable has a unique lowest point say, x = 0 in a simple case. −1 w(u)du
is the total weight of the segment [−1, x] for each x ∈ [−1, 1] for a weight
distribution w(x).
A given weight distribution w gives a shape y = y(x) for the hanging cable.
The direct problem is to find shape y caused by a given weight distribution w
while its inverse problem is to determine a distribution w from observing the
shape y.
9.3.3 Inverse Problems in Study of Trajectories

Consider again motion of a projectile with respect to a flat earth, see
Figure 9.5.
We consider a projectile of unit mass that is fired horizontally with a given
velocity v from a height h as shown in Figure 9.5. It can be derived that the
h v
FIGURE 9.5: Horizontal shot from height
path of projectile is parabolic if there is no resistance and gravitational force

is treated as a constant acting vertically, that is, the path of a projectile
is a parabola under no resistance and constant gravity. Two related inverse
problems are:
1. If the trajectory of a projectile is parabolic and there is no resistance, is
gravitational force necessarily constant?
2. If the trajectory of a projectile is parabolic under constant gravitational
force, can there be any resistance?
Example 307. Let a particle of unit mass be acted upon by a velocity grav-
itational force g(y) and follow the arc of a circular trajectory x2 + y 2 = a2 .
Find the form of g(y).
9.4 Inverse Problems in Matrix Equations

Consider a matrix equation of type Ax = y, where A is an n × n ma-
trix and x and y are vectors. A direct problem is to find y if A and x are
given. Finding x if A and y are given is known as an inverse problem in our
terminology. The problem of solving this linear system is the type of inverse
problem, often called a causation problem. If x and y are given, finding A
is another inverse problem, called the identification problem. In courses
of linear algebra, existence and uniqueness of a solution are critical, but no
attention is paid to a third issue: inverse problem instability. In this section we
illustrate inverse problems of matrix equations with special reference to ele-
mentary tomography, gravimetry, dynamical parameters of binary dynamical
systems and stereography.
9.4.1 Inverse Causation Problem

Let A be an m × n matrix. Finding vector y = Ax for given A and x is
direct problem. The inverse causation problem involves finding all vectors
x such that Ax = y for a given A and vector y. This problem gets a lot of
attention in linear algebra but rarely seen as example of an inverse problem.
Another interesting inverse problem, called the identification problem is
the one in which we find matrix A such that Ax = y for given vectors x and
y. In this subsection we confine discussion to the inverse causation problem.
A solution x ∈ Rn of the inverse causation problem, where A is m × n matrix
and y ∈ Rm is a given vector, exists if and only if y belongs to the range of
A, that is, in the subspace
R(A) = {Ax|x ∈ Rn }.
Determining whether y ∈ R(A), that is, whether a solution exists, and finding
all solutions can be accomplished by the method of Gaussian elimination. The
uniqueness issue is addressed by the null space of A, that is
N (A) = {x ∈ Rn |Ax = 0}.
The Gaussian elimination method is an elegant algorithm for characterizing
the null space and settling the uniqueness question. Let us assume that Ax = y
has a unique solution for each y, that is, A−1 exists. We are interested in know-
ing the error in the solution x for small errors in y. Let x be the unique solution
of the system of equation Ax = y for known y, and let x̃ be the unique so-
lution of the system Ax = ỹ, that is, Ax̃ = ỹ where ỹ is perturbation of y, then
kx − x̃k = kA−1 y − A−1 ỹk ≤ kA−1 kky − ỹk
ky − ỹk
kx − x̃k ≤ kA−1 kkyk
kyk
ky − ỹk
= kA−1 kkAkkxk .
kyk
Hence,
kx − x̃k ky − ỹk
≤ cond(A)
kxk kyk
where cond(A) = kAk−1 kAk.
This is called the condition number of the matrix A with respect to the norm
k.k. This condition number gives an upper bound for the relative error in the
solution caused by a relative error in y. For matrices with large condition
numbers, that is, ill-conditioned matrices, relatively small changes in y yield
relatively large changes in the solution x.
9.4.2 Identification Problems

The identification problem is finding m × n matrices A for given pairs of
vectors (x, y) connected by
Ax = y.
For each such pair we call x the input and y the output. An m × n matrix A
is called identifiable from the matrix pair (x, y) if Ax = y.
9.4.3 Inverse Problem for Eigenvalues and Eigenvectors

For a given matrix A finding a scaler λ such that Ax = λx for vector
x is called an eigenvalue problem. It is a direct problem associated with an
eigenvalue λ and eigenvector x.
The inverse problem for eigenvalues and eigenvectors is using eigenvalues
and corresponding eigenvectors of a matrix find the matrix. For example,
suppose that eigenvalues of

α −β
A=
−β γ + η
are 2 and 7 and eigenvalues of the reduced matrix

α −β
A¯=
−β γ
are 1 and 5. Find A and A¯.
9.4.4 Least Squares Solutions to Inverse Problems

In the case when b ∈ Rm is not in range of the m × n matrix A, there is no
solution to the problem Ax = b. A remarkable relationship between the null
space, range, and transpose allows the development of a type of generalized
solution, and such generalized solutions always exist. One type is called the
least squares solution, that is a vector u ∈ Rn that minimizes the quantity
kAx − bk over all x ∈ Rn , where the norm is the usual Euclidean norm. Note
that this minimum is zero if and only if the system has a solution. If u is a
least square solution, then for any vector v ∈ Rn , the function
g(t) = kA(u + tv) − bk2 = kAu − bk2 + 2(Av, Au − b)t + kAvk2 t2 ,
where (·, ·) is the familiar Euclidean inner product, has a minimum at t = 0.

The ncessary condition g 0 (0) = 0 for a minimum then gives
(Av, Au − b) = 0
and hence (v, AT Au − AT b) = 0 for all v ∈ Rn . That is, if u is a least square

solution, then
AT Au = AT b,
where AT is the transpose of A. Conversely, if AT Au = AT b, then for any

x ∈ Rn ,
kAx − bk2 = kA(x − u) + Au − bk2

= kA(x − u)k2 + 2(A(x − u), Au − b) + kAu − bk2
= kA(x − u)k2 + 2(x − u, AT Au − AT b) + kAu − bk2
≥ kAu − bk2 ,
that is, u is a least squares solution of Ax = b.

So, least squares solutions of Ax = b coincide with ordinary solutions of
the symmetric problem AT Ax = AT b. Now this symmetric problem always
has a solution since R(AT ) = R(AT A) and hence AT b ∈ R(AT A) for any
b ∈ Rm . Therefore, any linear system Ax = b has a least squares solution.
However, least squares solutions need not be unique. Indeed, if u is a least
squares solution, then so is u + v for any v ∈ N (A), that is, the set of least
squares solutions forms a hyperplane parallel to null space. Therefore, if A
has a nontrivial null space, then Ax = b has infinitely many least squares
solutions.
However, one least squares solution can be distinguished from the others,
namely, the one that is orthogonal to the null space. There can be at most
one least squares solution if u and w are both least squares solutions that
are orthogonal to N (A). Also, AT A(u − w) = AT bT− AT b = 0, and hence
u−w ∈ N (AT A) = N (A). Therefore, u−w ∈ N (A) N (A)⊥ , that is, u = w.
On the other hand, there is always a least squares solution that is orthogonal
to the null space, and hence any linear system has a unique least squares
solution that is orthogonal to the null space of the coefficient matrix. If we
agree to accept this notion of generalized solutions, then every linear system
has a unique (generalized) solution.
9.5 Inverse Problems in Differential equations

9.5.1 Inverse Problems for Mixing Problems
Direct mixing problems for well-stirred solutions are well known and taught
in an introductory course of ordinary differential equations. Given an initial
concentration of a solute and certain inflow and outflow rates, we can deter-
mine the concentrations at future times. The derivation of the model relies on
rate balancing. Let x(t) denote the quantity of solute in the vessel at time t,
dx
then the rate of change of x(t) with time, namely , is the difference between
dt
the rate at which the solute enters the vessel (inflow rate) and outflow rate
(the rate at which it leaves the vessel). Thus

dx x
= ar − r (9.2)
dt V
where V represents volume of the vessel, a denotes concentration of the solute
and r denotes the rate at which the liquid enters the vessel and goes out. The
x(t)
concentration of solute in the vessel is given by c(t) = and so the differ-
V
ential equation (9.2) takes the following form in terms of the concentration:
dc(t) r
= (a − c). (9.3)
dt V
Equation (9.3) has the unique solution
−rt
c(t) = a + (c0 − a)e V ,
where c0 is the initial concentration of solute in the vessel. The direct problem
is to find c(t), concentration at time t for given parameters a, c0 , V and r.
There are several related inverse problems like finding the initial concentration
or finding parameters for given concentration at time t.
9.5.2 Inverse Problem in Newton’s Law of Falling

Let s(t) denote the position of a particle, that is, its distance down a ramp
from its origin, see Figure 9.6. Since the gravitational force in the direction of
θ x
g
y
FIGURE 9.6: Descent of Ramp
the ramp is g sin θ, we have by Newton’s law

d2 s ds
2
= g sin θ, s(0) = 0, (0) = 0.
dt dt
The solution of the initial value problem is
g
s = (sin θ)t2 .
2
We review Galileo’s law of falling bodies if θ = π2 . For a given fixed time T,

the curve of points reached in time T for various declination angles θ is a circle
g
of radius T 2 :
4
Let x = s cos θ, y = s sin θ.
Using the equation for s, we get
g 2 2 g 2 g
x2 = T sin θ T cos2 θ = y( T 2 − y)
2 2 2
and hence
g g
x2 + (y − T 2 )2 = ( T 2 )2 .
4 4
The direct problem is the determination of the forms of the equitemporal
curves, for certain given laws, while the inverse problem is to find the form of
the resistance law from the knowledge of shapes of the equitemporal curves.
9.5.3 Inverse Problem in Newton’s Cooling Law

According to the simplest version of Newton’s law of cooling the rate at
which the surface temperature changes in time is proportional to the difference
between the ambient and surface temperatures. Let surface temperature at
time t be T (t) and ambient temperature be constant, say S. Newton’s law of
cooling implies that
dT
= λ(S − T )
dt
where λ, the heat transfer coefficient, is a positive constant. The direct problem
of determining the surface temperature has the unique solution:
T (t) = S + (T (0) − S)e−λt .
This solution depends on three parameters: the ambient temperature S, the

initial temperature T (0), and the heat transfer coefficient λ. Given surface
temperature at appropriate times determination of parameters S, T (0) and λ
is an inverse problem.
Example 308. Surface temperatures of a body that cools according to New-
ton’s law are given in the following table.
t (min) u(o F )
5 72
16 62
15 54
Find the ambient temperature, the initial surface temperature, and heat
transfer coefficient (S, T (0), λ).
9.5.4 Inverse Problem in Finance

In a continuously compounded interest model, the percentage rate of
change of an investment is a given constant interest rate r. That is, the rate
of change of the value, relative to the current value is a constant r. A mathe-
matical formulation is
1 df
=r
f dt
where f is the value history, that is f (t) denotes the value of the investment
at time t. This, of course is equivalent to
d
ln f = r.
dt
Solution of this equation is
f (t) = f (0)ert .
In the variable interest rate, the interest rate is a function of time, say r(t).
This implies
d
ln f = r(t)
dt
which gives Rt
f (t) = f (0)e 0 r(s)ds .
Given f (0) and r(t), finding f (t) is regarded as a direct problem while finding
r(t) given f (t) for all t is an inverse problem. f (t) is called value history and
r(t) is known as variable interest rate.
Example 309. Find the interest rate r(t) so that the number of Rs 50,000
invested yields Rs 50,000 in 10 years.
9.5.5 Inverse Problem in Carbon dating

Solution of Equation (2.10) is the basis of an important technique devel-
oped by the chemist Willard Libby in 1950 to determine the ages of certain
artifacts. Libby was awarded the 1960 Nobel Prize for chemistry for this work.
The process of estimating the age of an artifact or fossil is called carbon dat-
ing. The theory of carbon dating is essentially based on the fact that the
half life of radioactive C–14 is approximately 5600 years. Libby’s method has
been used to date furniture in Egyptian tombs, examine Van Meegeren Art
forgeries and date different civilizations through archaeological excavation. In
Example 128 direct problem is solved. The inverse problem is to find λ while
T and A(t) are given.
9.5.6 Inverse Problem in Population Growth

A direct problem is studied in Section 2.6.1 (Example 121). Finding λ
when P (t) and P (0) are given is an inverse problem. In Example 122 finding
the annual growth rate is an inverse problem.
Example 310. Consider the following one-dimensional initial value problem.
Find u(x) such that

d du
− a = f on (0, 1)
dx dx
du
a(0) (0) = g0
dx
du
a(1) (1) = g1 .
dx
We are interested in the following inverse problem. Given z, a measurement
of u, find a which together with z satisfies the boundary value problem.
R1
Let z = u + µ(x), where 0 |µ(x)|2 dx < ∞. We choose f = 0, g = g1 = 1.
Then for u(x) = x + 1. The unique solution is a(x) = 1. If we have
cos(2nπx)
zn (x) = u(x) + , 0 < < 1,
2nπ
The corresponding solution is
1
an = .
1 − sin(2nπx)
It can be checked that zn → n but an 9 a.
9.6 Inverse Problem in Partial Differential Equations

9.6.1 Inverse Problem for Heat Equation
Consider a rod of unit length and unit thermal conductivity which ends
at a fixed temperature. The temperature distribution u(x, t) satisfies the heat
equation
∂ 2 u ∂u
a2 2 − = 0, 0 < x < 1, t > 0
∂x ∂t
with the initial and boundary conditions
u(x, 0) = u0 (x) and u(0, t) = u(1, t) = 0.
The inverse problem is to find the initial temperature distribution u0 (x) when
the temperature distribution at time T > 0 is given. Finding A while u is
known is another inverse problem as is determining u0 (x) if aand u are given.
This problem can be solved following Amita and Siddiqi [41].
9.6.2 Inverse Scattering Problem

In inverse scattering problems one is looking for an object D or a medium
parameter from results of observations of a so-called field generated by (plane)
incident waves u0 (x) = exp(ik θ̂x), where k > 0 denotes wave number and
θ̂ is a unit vector that describes the direction of the incident wave. The field
(acoustic, electromagnetic or elastic) in the simplest situation of scattering by
an obstacle D is a solution u to the Helmholtz equation
∆u + k 2 u = 0 in RN \D (N = 2, 3)
satisfying the homogeneous Dirichlet boundary condition

u = 0 on ∂D (soft obstacle)
or another boundary condition like the Neumann condition

∂v u + bu = 0 on ∂D (hard obstacle)
along with
∂us x
− ikus = O(r−(N +1)/2 ) for r = |x| → ∞ uniformly in .
∂r |x|
This solution is assumed to be the sum of the plane incident wave ui and a
scattered wave us due to the presence of an obstacle, that is,
u = u0 + us .
In most cases in scattering theory one assumes that R3 \D is connected, and

D is a bounded open set with Lipshitz boundary. In some situations spherical
incident waves (depending only on |x − a|) are more useful and natural.
For, acoustic scattering problems, v(x) = u(x)eiωt describes the pressure
and k = ω/c is the wave number with speed of sound c. For suitably polar-
ized time harmonic electromagnetic scattering problems. Maxwell’s equation
reduces to the two-dimensional Helmholtz equation for the components of the
electric (or magnetic) field u. The wave number k, is given in terms of the
√
dielectric constant and permeability µ by k = µω.
In both cases, the radiation condition yields the following asymptotic be-
havior:
exp(ik|x|)
us (x) = u∞ (x̃) + O(|x|−(N +1)/2 ) as|x| → ∞,
|x|(n−1)/2
where x̃ = x/|x|. The inverse problem is to determine the shape of D when

the far field pattern u∞ (x̃) is measured for all (x̃) is measured for all x̃ on the
unit sphere in RN .
9.6.3 Examples
These and related inverse scattering problems have various applications in
computer tomography, seismic and electromagnetic exploration in geophysics,
and nondestructive testing of materials.
9.6.4 Inverse Problem for Wave Equation

In the following equation,
∂2u 2
2∂ u
= c ,
∂t2 ∂x2
u(x, t) represents the displacement, for example, of a vibrating string from its
equilibrium position, and c the wave speed.
These types of equations have been applied to model vibrating membranes,
acoustic problems for the velocity potential of the fluid flow through which
sound can be transmitted, longitudinal vibrations of an elastic rod or beam,
and both electric and magnetic fields in the absence of charge and dielectric.
Solving this equation under given initial or/and boundary conditions for a
given parameter is called a direct problem. Find the parameter c for a given
solution under the approximate boundary and initial conditions is an inverse
problem. Similarly for a given solution and parameter, determining an initial
or boundary condition is also known as an inverse problem.
9.6.5 Examples
The inverse problem for the following well known equations can be posed
on similar lines.
(a) Laplace equation in R2 (two-dimensional)
∂2u ∂2u
∇2 u = ∆u = + 2 = 0,
∂x2 ∂y
where ∇2 = ∇.∇ denotes the Laplacian

∂ ∂
∇= , .
∂x ∂y
The equation is satisfied by the electrostatic potential in absence of charges,
by the gravitational potential in the absence of mass, by the equilibrium dis-
placement of a membrane with a given displacement of its boundary and by
the velocity potential for an inviscid, incompressible, irrotational homogeneous
fluid in the absence of sources and sinks and many other real world systems.
(b) Poisson equation (non-homogeneous Laplace equation)
∇2 u = −f (x, y).
One encounters this equation while studying the electrostatic potential in the
presence of charge, the gravitational potential in the presence of distributed
matter, the equilibrium displacement of a membrane under distributed forces,
the velocity potential for an inviscid, incompressible, irrotational homogeneous
fluid in the presence of distributed sources and sinks, and the steady state
temperature in the presence of thermal sources or sinks.
(c) Transport equation in R (one-dimensional)
∂u ∂u
+c = 0,
∂t ∂x
where c is a constant and u(x, t) denotes the location of a car at time t and
position x.
(d) Traffic flow
∂u ∂u
+ a(u) = 0,
∂t ∂x
where u(x, t) denotes the density of cars per unit kilometer of expressway at
location x and time t and a(u) is a function of u, say the local velocity of
traffic at location x at time t.
(e) Burger’s equation in one dimension
∂u ∂u
+u = 0.
∂t ∂x
This equation arises in the study of streams of particles or fluid flow with zero
viscosity.
(f ) Eikonal equation in R2 (two dimensional)
2 2
∂u ∂u
+ =0
∂x ∂y
models problems of geometric optics.
(g) Helmholtz equation
(∇2 + k 2 )u = 0
has been found useful in diffraction theory.

(h) Klein-Gordon equation
∂2u
− c2 ∇2 u + m2 u = 0.
∂t2
This equation arises in quantum field theory, where m denotes the mass.
(i) Telegraph equation
∂2u ∂u ∂2u
+ A + Bu = .
∂t2 ∂t ∂x2
A and B are constants. This equation arises in the study of propagation of

electrical signals in a cable transmission line. Both the current I and voltage
V satisfy equations of this type. This equation also arises in the propagation
of pressure waves in the study of pulsatile blood flow in arteries.
(j) Schrodinger equation (time independent)
h2 ∂ 2 u ∂ 2 u

+ 2 + (E − V )u = 0.
2m ∂x2 ∂y
Note that m is the mass of the particle whose wave function is u(x, y), h is
the universal Planck’s constant, V is the potential energy and E is a constant.
This equation arises in quantum mechanics. If V = 0 then it reduces to the
Helmholtz equation.
(k) Korteweg de Vries (KDV) equation in one dimension
∂u ∂u ∂ 3 u
+ cu + = 0.
∂t ∂x ∂x3
This equation arises in shallow water waves.
(l) Euler equation in R3
∂u 1
+ (u.∇)u + ∇p = 0.
∂t ρ
The u denotes the velocity field, and p the pressure.
(m) Navier-Stokes equation in R3
1
ut + (u.∇)u + ∇p = v∇2 u.
ρ
The v denotes the kinematic viscosity, p stands for pressure and ρ for density
of the fluid, and u = (u1 , u2 , u3 ) represents velocity of the fluid.
(n) Maxwell equations in R3
∂E
− ∇ × H = 0.
∂t
∂H
− ∇ × E = 0.
∂t
E and H denote the electric and the magnetic fields, respectively.
9.6.6 Financial Mathematics

Option pricing also called options, are contracts that give an owner the
right to buy or to sell a fixed number of underlying assets in a specified com-
mon market at a fixed price on or before a prescribed date. Options are widely
traded on all major exchanges. Strike price or exercise price is the price at
which the underlying asset is bought in options. Underlying assets are com-
modities, shares, foreign currencies, government or private enterprise bonds,
stocks and stock indices. Equity is a share in the ownership of a company
which usually guarantees the right to vote at meeting and share in dividends
(payments to shareholders as return for investment). A derivative (financial
derivative) is a contract or security whose payoffs are final (asset). In many
cases a derivative is the price of the underlying equity. The termination time
of a derivative (contract), usually the time when payoff value is calculated and
paid, is called expiry (time). Volatility is a measure of standard deviations of
returns. It is a function of underlying asset and time.
Let u denote the value or premium of the options which depends on a
number of factors such as stock price say x, current time t, the maturity date
T , the exercise price K, the risk free interest r and local volatility (infinitesimal
standard deviation) σ (may be constant only time dependent or may satisfy
relation of type σ(x, t) = σ(x)σ(t). It has been proved that u(x, t) is a solution
of the partial differential
∂u 1 2 2 ∂2u ∂u
+ x σ (x, t) 2 + rx − ru = 0. (9.4)
∂t 2 ∂x ∂x
This is the Black-Scholes equation.
Direct problem of option pricing: Given the local volatility function
σ(x, t) find the solution of (9.4) with boundary conditions:
u(x, t) = max(x − K, 0) and u(0, t) = 0. (9.5)
Inverse problem: Find the local volatility function σ(x, t) such that the
solution to (9.4) with different strikes K and maturities T satisfy
u(x∗ , t∗ , K, T ) = u∗ (K, T ) (9.6)
where the right hand side denotes the current market price of the option with
the corresponding strike and maturity at time t∗ when the underlying price
(asset) is x∗ . Practitioners refer to this problem as model calibration.
9.7 Inverse Problems of Image Processing

9.7.1 Fundamental Steps in Digital Image Processing
Image acquisition: The types of images in which we are interested are gener-
ated by the combination of an illumination source and reflection or absorption
of energy from that source. The illumination may originate from a source of
electromagnetic energy such as radar, infrared or X-ray energy. Sensors are

used to transform illumination energy into digital images. In a simple way in-
coming energy is transformed into voltage by a combination of input electrical
power and sensor material that is responsive to the particular type of energy
detected. The output voltage waveform is the response of the sensor, and a
digital quantity is obtained from each sensor by digitizing its response.
Image enhancement: Brings out obscured details or highlights certain fea-
tures of interest in an image.
Compression: Deals with techniques for reducing the storage required to
save an image. Image compression is familiar to most users of computers in
the form of image file extensions, such as jpg file extensions.
Morphological processing: Deals with the tools for extracting image com-
ponents that are useful in the representation and description of shape.
Segmentation: Partitions images into their constituent parts.
Representation and description: They follow the output of a segmenta-
tion stage which is usually raw pixel data, constituting the boundary of a
region or all points in a region.
9.7.2 Introduction to Medical Imaging

Gamma-ray imaging: Major uses of imaging based on gamma rays include
nuclear medicine and astronomical observations. In nuclear medicine, the ap-
proach is to inject a patient with a radioactive isotope that emits gamma rays
as it decays. Images are produced from the emissions collected by gamma
ray detectors. The principle is the same as for X-ray tomography. However,
instead of using an external source of X-ray energy, the patient is given a
radioactive isotope that emits protons as it decays. When a proton meets an
electron both are annhilated; image is created using the basic principles of
tomography.
X-ray imaging: X-rays are among the oldest sources of electromagnetic ra-
diation used for imaging. The best known use of X-rays is for medical diag-
nostics, but they are also used extensively in industry and other areas, like
astronomy. X-rays for medical and industrial imaging are generated using a
vacuum tube with a cathode and anode. The cathode is heated, causing free
electrons to be released. These electrons flow at high speed to the positively
charged anode. When the electrons strike a nucleus, energy is released in the
form of X-ray radiation. The energy (penetrating power) of the X-rays is
controlled by a voltage applied across the anodes, and the number of X-rays
is controlled by a current applied to the filament in the cathode.
Imaging in ultraviolet band: Applications of ultraviolet light are varied.
They include lithography, industrial inspection, microscopy, lasers, biological
imaging and astronomical observation. We illustrate imaging in this band with

examples from microscopy and astronomy.
Ultraviolet light is used in fluorescence microscopy, one of the fastest grow-
ing areas of microscopy. Fluorescence is a phenomenon discovered in the mid-
dle of the 19th century, when it was first observed that the mineral fluorspar
fluoresces when ultraviolet light is directed upon it. The ultraviolet light is not
visible, but when a photon of ultraviolet radiation collides with an electron in
an atom of a fluorescent material, it elevates the electron to a higher energy
level. Subsequently, the excited electron relaxes to a lower level and emits
light in the form of a lower energy photon in the visible (red) light region.
The basic task of a fluorescence microscope is to use an excitation light to
irradiate a prepared specimen and then separate the much weaker radiation
fluorescent light from brighter excitation light.
Imaging in visible and infrared bands: Considering that the visual band
of the electromagnetic spectrum is the most familiar, it is not surprising that
imaging in this band outweighs by far all the others in terms of scope of appli-
cation. The infrared band is often used in conjunction with visual imaging in
areas such as light microscopy, astronomy, remote sensing, industry and law
enforcement.
Even in microscopy, the application areas are too numerous to detail here.
It is not difficult to conceptualize the types of processes one might apply to
these images, ranging from enhancements to measurements.
EEG: An electroencephalogram (EEG) is a test that measures and records
the electrical activity of the brain. Special sensors (electrodes) are attached
to a patient’s head and hooked by wires to a computer. The computer records
the brain’s electrical activity on the screen or on paper as wavy lines. Certain
conditions, such as seizures, can be revealed by the changes in the normal
pattern of the brain’s electrical activity.
ECG: The electrocardiogram (ECG) records the electrical activity of the
heart, where each heart beat is displayed as a series of electrical waves char-
acterized by peaks and valleys. Any ECG gives two kinds of information; the
duration of the electrical wave crossing the heart which in turn decides whether
the electrical activity is normal or slow or irregular and also the amount of
electrical activity passing through the heart muscle which reveals whether the
parts of the heart are too large or overworked. Normally, the frequency range
of an ECG signal is 0.05 to 100 Hz and its dynamic range to 1–10 mV.
MRI: Magnetic resonance imaging (MRI) is a test that uses a magnetic fields
and pulses of radio wave energy to depict organs and structures inside the
body. In many cases MRI gives more detailed information about structures in
the body that can be seen with an X-ray, ultrasound, or computed tomogra-
phy (CT ) scan. MRI also may show problems that cannot be seen with other
imaging methods.
FMRI: Functional magnetic resonance imaging (FMRI) is a technique for
measuring brain activity. It works by detecting the changes in blood oxygena-

tion and flow that occur in response to neural activity. When a brain area
is more active, it consumes more oxygen and to meet this increased demand
blood flow increases flow to the active area. FMRI can be used to produce
activation maps showing which parts of the brain are involved in a particular
mental process.
9.7.3 Tomography
Scientists who developed the field of tomography have received several No-
bel prizes. The word tomography comes from the Greek word for slice and the
discipline deals with exposing internal structures of non-transparent objects
by transmitting signals such as electromagnetic waves (such as radio waves,
microwaves and X-rays) and acoustic waves. In computed tomography, scan-
ner images do not provide ready images as X-ray systems do. Rather, the
images result from intricate data measurements and mathematical processes.
The scientists who have studied the relevance of wavelets and their variants
to tomography include Mark Bottema, Bill Moran and Sofia Suvorova [5],
Glenn Easley, Flavia Colonna, Demetrio Labate, and Kanghui Guo. Several
publications focus on variants of wavelets such as curvelets, ridgelets, complex
wavelets and vector valued wavelets.
9.8 Seismic Tomography

Seismic tomography is a methodology for analyzing and computing earth’s
activities. Seismology is the scientific study of earthquakes and the propaga-
tion of elastic waves through the earth or other planet-like bodies. It also in-
cludes studies of earthquake effects, such as tsunamis diverse seismic sources
such as volcanic, tectonic, oceanic, atmospheric and artificial process (such as
explosions). A record of earth motion as a function of time is called a seismo-
gram. Experts treat seismic tomography as a part of seismic imaging, where
they are mainly concerned with estimating properties such as propagating
velocities of compressional waves (P-waves) and shear waves (S-waves).
Seismic migration is a branch of seismic imaging in which the properties
to be estimated include the reflection coefficient or reflectivity.
A tomography is defined as a methodology where three-dimensional images
are derived from the integrated properties of a medium that rays encounter
along their paths through it. Thus seismic tomography may be thought of as
the derivation of a three-dimensional velocity structure from seismic waves.
Estimation of P-wave velocity is the simplest example of seismic tomography.
Several methods such as refraction, travel time tomography, finite frequency
travel time tomography reflection travel time tomography and wave form to-
mography have been developed to estimate seismic tomography.
Seismic tomography is normally formulated as an inverse problem. A beau-
tiful account of this special theme and inverse problems in general can be
found in Iske and Randen [17] and Vogel [42]. An application of wavelets in
tomography has been discussed in Bottema et al. [5].
9.9 Exercises
9.1. Solve the inverse problem of determining the depth D of a well from the
elapsed time t between the dropping of a stone and hearing of a splash.
Does the equation determining D have a unique solution? Is there a
unique feasible solution?
9.2. A stone is dropped into a well and 4.2 seconds later a splash is heard.
How deep is well?
9.3. Find the weight distribution with total weight 1 that gives shape y(x) =
π
1 − cos .
x
9.4. Show that if a shape is a parabola its weight distribution is constant.
9.5. Find the solution of Exercises 304 through 307.
9.6. Find matrix A that satisfies AX = B, where
   
1 1 1 −2 6 3
X =  −1 −1 0  and B =  −1 0 0 .
0 1 1 3 0
9.7. Find the solution of Exercise 308.

9.8. Show that a differential coefficient satisfying

d du
− k(x) = f (x)
dx dx
−k(0)u0 (0) = A, − k(1)u0 (1) = B
R1
exists only if 0 f (s)ds = A.B.
9.9. Find the variable interest rate corresponding to the value history
√
u(t) = t + 5e10t .
9.10. Show that the two value histories have the same interest rate if and only
if their ratio is a positive constant.

Direct problems for equations in Section 9.6.5 are discussed in Chapter 5
and 7. Study of inverse problems related to these equations may be challeng-
ing. References [1], [4], [5], [7], [9], [11], [13], [17], [18], [24], [32] and [39] provide
interesting results in this field. As we have seen there is a close connection of
tomography, radon transform and medical imaging. Kutchment [31] presents
updated developments in this area. Readers interested in applications of in-
verse problems in medical and biological science may pursue references [31,
37]. For advanced knowledge and application in these areas, we refer [2], [3],
[6], [8], [10], [12], [14] through [16], [19] through [23], [25] through [30], [33]
through [36], [38] through [41] and [42].
Bibliography
[1] G. Bal, Inverse Transport theory and applications, Inverse Problems,

25, 053001, 2009.
[2] G. Bal, Cauchy problem ultrasound modulated EIT, Analysis of Partial

Differential Equations, 6: 751-775, 2013.
[3] G. Bal, D. Finch, P. Kuchment, P. Stefanov, G. Uhlmann (Eds.), To-
mography and Inverse Transport Theory, AMS, Providence, RI, 2011.
[4] C. Börgers and F. Natterer (Eds), Computational Radiology and Imag-

ing. Therapy and Diagnostics, Springer-Verlag, New York, 1999.
[5] M. Bottema, B. Moran and S. Suvorova, An application of wavelets in
tomography, Digital Signal Processing, 8: 244-254, 1998.
[6] R. Ewing (Ed.), The Mathematics of Reservoir Simulation, SIAM,

Philadelphia, 1983.
[7] T. M. Buzug, Computed Tomography, Springer, Berlin, 2008.
[8] D. Colton, R. Ewing, W. Rundell (Eds.), Inverse Problems in Partial
Differential equations, SIAM, Philadelphia, 1990.
[9] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scatter-

ing Theory, Second Edition, Springer, Berlin, 1998.
[10] P. Elbau, O. Scherzer, R. Schulze, Reconstruction formulas for photoa-
coustic sectional imaging, Inverse Problems, 28, 045004, 2012.
[11] H. W. Engl, M. Hanke, A. Neubauer, Regularization of Inverse problems,

Kluwer, Dordrecht, 1996.
[12] K. M. Furati, P. Manchanda, A. H. Siddiqi, Wavelet methods in oil
industry, in Proceedings of IWW, Istanbul Aydin University, pp.26-36,
2008.
[13] K. M. Furati, M. Z. Nashed and A. H. Siddiqi(Eds.), Mathematical Mod-

els and Methods for Real World Systems. Chapman & Hall, New York,
2006.
623
624 Bibliography
[14] K. M. Furati, H. Tawfiq and A. H. Siddiqi, Simulation and visualization

of safing sensor by fast flow, AM. J. Appl. Sci., 2: 1261-1265, 2005.
[15] L. Gao, K. J. Parker, S. K. Alam, Sonoelasticity imaging: theory and

experimental verification, J. Acoust. Soc. Amer., 97: 3875-3880, 1995.
[16] R. C. Gonzales and R. E. Wood, Digital Image Processing, Reading MA,
1993.
[17] A. Iske and T. Randen (Eds), Mathematical Methods and Modelling in

Hydrocarbon Exploration and Production. Springer, New York, 2006.
[18] C. W. Groetsch, Inverse Problems in Mathematical Sciences, Viewag,
Braunschweig, Wiesbaden, 1994.
[19] C. W. Groetsch, Inverse Problems: Activities for Undergraduates, Math-

ematical Association of America, 1999.
[20] I. Hazou and D. Solmon, Inverse of exponential Radon transform II,
Analysis, Math. Methods Appl. Sci., 13: 205-218, 1990.
[21] I. Hazou and D. Solmon, Inversion of the exponential Radon Transform
II, Numerics, Math. Methods Appl. Sci., 13: 109-119, 1990.
[22] S. Helgason, The Radon Transform, Second Edition, Cambridge Univer-
sity, 1999.
[23] V. Isakov, Inverse Source Problems, AMS, New York, 1990.
[24] V. Isakov, Inverse Problems for Partial Differential Equations, Second

Edition, Springer, Berlin, 2005.
[25] A. C. Kak and M. Slaney, Principles of Computerized Tomographic
Imaging, SIAM, Philadelphia, 2001.
[26] A. I. Katsevich, Local tomography for the generalized Radon Transform,

SIAM J. Appl. Math., 57: 1128-1162, 1997.
[27] A. Katsevich, Local tomography with nonsmooth attenuation, Trans.
Amer. Math. Soc., 351: 1947-1974, 1999.
[28] A. Kirsch, An Introduction to the Mathematical Theory of Inverse Prob-
lem, Springer, New York, 1996.
[29] A. Kirsch and N. Grinberg, The Factorization Model for Inverse Prob-
lems, Oxford University Press, London, 2008.
[30] A. Kirsch and O. Scherzer, Simultaneous reconstructions of absorption
density and wave speed with photoacoustics measurements, SIAM, J.
Appl. Math., 72: 1508-1523, 2012.
Bibliography 625
[31] P. Kuchment, The Radon Transform and Medical Imaging, Vol.85, Re-
gional Conference Series in Applied Mathematics, SIAM, 2014.
[32] P. Monk, Finite Element Methods for Maxwell’s Equations, Oxford Sci-
ence Publications, Oxford, 2003.
[33] F. Natterer, The Mathematics of Computerized Tomography, Wiley, New
York, 1986, reprinted in 2001 by SIAM.
[34] F. Natterer and F. Wübbeling, Mathematical Methods in Image Recon-

struction, SIAM, Philadelphia, 2001.
[35] H. Neunzert and A. H. Siddiqi, Topics in Industrial Mathematics: Case
studies and Related Mathematical Methods. Kluwer, Dordrecht, 2000.
[36] A. H. Siddiqi, Applied Functional Analysis, Marcel Dekker, New York,

2004. Indian Edition by Anamaya, New Delhi, 2010.
[37] A. H. Siddiqi, R. C. Singh and P. Manchanda (Eds.), Mathematics in
Science and Technology, World Scientific, Singapore, 2011.
[38] A. H. Siddiqi, A. K. Gupta and M. Brokate (Eds.), Modelling of En-
gineering and Technological Problems, American Institute of Physics,
Melville, NY, 2009.
[39] A. H. Siddiqi, I. S. Duff and O. Christensen, Modern Mathematical Mod-
els: Methods and Algorithms for Real World Systems, Anshan and Ana-
maya, 2007.
[40] A. H. Siddiqi and K. M. Furati, Fast Wavelet Algorithms for Simulation

Safing Sensor Final Report, KFUPM, Dhahran, Saudi Arabia, 2005.
[41] J. Tittman, Geological Well Logging, Academic Press, London, 1986.
[42] C. R. Vogel, Computational Methods for Inverse Problems, Frontiers in
Applied Mathematics, SIAM, Philadelphia, 2002.
Chapter 10
Wavelets
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627

10.2 Overview of Wavelet Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
10.2.1 Definition and Example of Wavelets . . . . . . . . . . . . . . . . . . . . 628
10.2.2 Multiresolution Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
10.3 Applications of Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
10.3.1 Applications of Wavelets to Biometrics . . . . . . . . . . . . . . . . . 639
10.3.2 CAT Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
10.3.3 Seismic Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
10.3.4 Variants of Wavelets in Medical Imaging . . . . . . . . . . . . . . . 644
10.3.5 Applications in Power Systems (Figure 10.17) . . . . . . . . . . 647
10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
10.1 Introduction
Wavelet theory is the outcome of a multidisciplinary endeavor that brought
together mathematicians, physicists and engineers. This interaction created a
flow of ideas that goes well beyond the construction of new bases and trans-
forms.
In fact, wavelet theory is a refinement and extension of Fourier analysis.
Short-comings of Fourier analysis were realized as early as 1946. To remove
these deficiencies, Nobel Laureate of Physics, Dennis Gabor, introduced the
windowed Fourier transform (short-time transform). This transform suffered
from some algorithmic difficulties and to eliminate them wavelet theory was
introduced.
During the 1980’s, French geophysicist Morlet developed a new approach
the wavelet transform, while studying problems in oil and gas exploration. The
term “wavelet” arose because the function from which the wavelet transform
is constructed can be thought of as a localized wave.
In the last 35 years wavelet methods have been applied to diverse fields of
science, engineering and technology [1] through [5], [9] through [11], [14], [17]
through [40]. Typical applications are presented in these references to: physics,
geophysics, function approximation, signal processing, harmonic analysis, dif-
ferential equations, electrical engineering, biomedical engineering, EEG, ECG,
627
FIGURE 10.1: Prof. Yves Meyer
MRI, finance and financial data analysis, mechanical engineering, civil engi-
neering, remote sensing, forestry and historical document retrieval.
A leading worker in this field, Prof. Yves Meyer, has received the Abel
Prize of 2017 for his significant contributions in wavelet methods.
In this chapter we give an overview of wavelet methods and their appli-
cations to several areas of engineering and technology, namely biometrics,
computed axial tomography, seismic tomography, medical imaging and power
systems.
10.2 Overview of Wavelet Methods

10.2.1 Definition and Example of Wavelets
The definition of a wavelet is based on the notion of an orthonormal basis
of functions. (Without explicitly mentioning wavelets, we have used them
Wavelets 629
in Fourier analysis already.) A system {f1 , f2 , . . . } of real valued functions

defined on R is said to be orthonormal, if
Z ∞
hfm , fn i = fm (x)fn (x) dx = 0 , whenever m 6= n, (10.1)
−∞
and hfn , fn i = 1 for all n.

This is the same as Definition 77, except that we now consider R instead of
an interval [a, b] as the domain of the functions fn . The system {fn } is called
an orthonormal basis (of the space of all square integrable functions) if it is
complete in the sense that every square integrable function f can be expressed
either as a linear combination of functions from the system {fn } or as a limit
of such linear combinations, namely
Z ∞ N
X 2
lim f (x) − cn fn (x) dx = 0 (10.2)
N →∞ −∞ n=1
for suitable coefficients cn .

Definition 97.
Z ∞
L2 (R) = {f : R → R/ |f (t)|2 dt < ∞}.
−∞
Definition 98. (Vanishing moments) Let N be a positive integer. A function

ψ : R → R has N vanishing moments if
Z ∞
xl ψ(x)dx = 0 for l = 0, 1, 2, · · · · · · , N − 1.
−∞
Definition 99. (Wavelet, wavelet family) Let ψ be a real valued function on

R which is square integrable. For any integers j, k we define the functions
ψj,k (t) = 2j/2 ψ(2j t − k) , x ∈ R. (10.3)
If {ψj,k }j,k∈Z is an orthonormal basis of the space of all square integrable

functions, then ψ is called a wavelet and the family {ψj,k } is called a wavelet
family.
Example 311. (Haar wavelet) Let ψ be defined as follows

1 ,
 0 ≤ t < 12
ψ(t) = −1 , 12 ≤ t < 1 ,

0, otherwise.

It can be checked that {ψj,k } is an orthonormal basis for the square summable
functions.
1.5
1.0 t- -
I
I
0 .5 I- I
-
.,
I
I
0.0
I
I I
- 0 .5 I- I I
I
-
I
I
•
I
- 1.0 I- 6 -
- 1.5
0
FIGURE 10.2: Haar Wavelet
The graph of ψ is given on the right side of Figure 10.2.

Example 312. (Mexican Hat Wavelet) Let the function ψ be defined as
2
ψ(t) = c(1 − t2 )e−t /2
,
where c is a suitable constant to ensure orthonormality of the family {ψj,k }.

Indeed, ψ is a wavelet, it is called the Mexican hat wavelet or the Ricker
wavelet. Its graph (with c = 1) is given in Figure 10.3.
FIGURE 10.3: Mexican Hat Wavelet
Example 313. (Daubechies Wavelets) The Daubechies wavelets (Figure

10.4), based on the work of Ingrid Daubechies, are orthogonal wavelets defin-
ing a discrete wavelet transform and characterized by a maximal number of
vanishing moments for some given support. With each wavelet type of this
class, there is a scaling function (called the father wavelet) which generates
Wavelets 631
an orthogonal multiresolution analysis. The Daubechies wavelets are not de-

fined in terms of the resulting scaling and wavelet functions.
FIGURE 10.4: Daubechies 20 2-D Wavelet (Wavelet F n× Scaling F n)
- 1 ·_
0 2 4
FIGURE 10.5: Db4
Db4 and Db10 (Figures 10.5 and 10.6) are from the Daubechies family of
wavelets. Db wavelets have no analytical expression, they are constructed
numerically.
Definition 100. Let ψ be a wavelet. The wavelet coefficients dj,k of a
square integrable function f are defined as
Z ∞
dj,k = hf, ψj,k i = f (t)ψj,k (t) dt . (10.4)
−∞
Since the wavelet family is an orthonormal basis, one can show that for
1
I
1 ~,
"' 1.'
0
~ I
-1
0 1Q 15
FIGURE 10.6: Db10
square integrable functions f

∞
X ∞
X
f (t) = dj,k ψj,k (t) . (10.5)
j=−∞ k=−∞
The right-hand side is called the wavelet series or the wavelet represen-
tation of f . The limit in the double sum is understood as convergence in the
quadratic mean, as in (10.2),
Z ∞ N
X N
X 2
lim f (t) − dj,k ψj,k (t) dt = 0 .
N →∞ −∞ j=−N k=−N
Remark 76. The wavelet coefficients dj,k measure the frequency content of
f near the point t = 2−j k. Small values of j correspond to low frequency
components and high values of j correspond to high frequency components.
According to the above,
N
X N
X
f (t) ≈ dj,k ψj,k (t) (10.6)
j=−N k=−N
if N is sufficiently large. One of the advantages of wavelets compared to other

tools of signal processing is that it is often possible to obtain a good approx-
imation of f using only relatively few coefficients dj,k , so the computational
effort is not large. This does not necessarily mean that N is small. But the
localizing effect of wavelets often leads to situations where most of the wavelet
coefficients {dj,k }|j|,|k|≤N vanish or are very small, for example, when f varies
only slowly on most parts of its domain and has high fluctuations in small
regions only.
Wavelets 633
In Fourier analysis we studied the Fourier series the Fourier transform. For
wavelets, have wavelet coefficients and wavelet transforms.
Definition 101. (Wavelet transform) Let ψ be a square integrable real valued
function on R. For a, b ∈ R with a 6= 0 we set
1 t − b
ψa,b (t) = p ψ .
|a| a
The wavelet transform of any square integrable function f with respect to

ψ is denoted by Wψ [f ] and defined by
Z ∞ t − b
Wψ [f ](a, b) = hf, ψa,b i = |a|−1/2 f (t)ψ dt . (10.7)
−∞ a
Remark 77. (a) The wavelet transform of f is a function of two variables

(a and b), in contrast to the Fourier transform which is a function of a single
variable. While both transforms yield information concerning the frequency
domain, the wavelet transform exhibits information related to the time domain
much more readily than the Fourier transform.
(b) The wavelet coefficients dj,k of f are related to the wavelet transform by
dj,k = hf, ψj,k i = Wψ [f ](2−j , k2−j ) .
While the formulas defining the wavelet transform make sense for any
square integrable function ψ, as in the definition above, the wavelet transform
is of good use only if it is invertible, like the Fourier transform, because only in
this case we can retrieve the original function from its wavelet transform. As
an extreme case, setting ψ = 0 we have Wψ [f ] = 0 for all f , so all information
is lost when we apply Wψ . A sufficient condition for invertibility is given in
the following definition.
Definition 102. Let ψ beR integrable as well as square integrable on R. We
∞
say that ψ is admissible if −∞ ψ(t)2 dt = 1 and
∞
|ψ̂(ξ)|2
Z
cψ = dξ < ∞ . (10.8)
−∞ |ξ|
Remark 78. A function ψ can be admissible according to this definition only
if Z ∞
ψ(t) dt = 0 .
−∞
Indeed, since ψ is assumed to be integrable on R, its Fourier transform ψ̂ is

a continuous function according to Remark 50(ii); therefore the integral in
(10.8) would be divergent if ψ̂(0) 6= 0. Hence admissibility implies that
Z ∞
ψ(t) dt = ψ̂(0) = 0 .
−∞
Remark 79. The function displays the time information and hides the in-
formation about frequencies while the Fourier transform displays information
about frequencies and hides the time information. The energy of two signals
represented by ψ ψ̂ is the same, that is,
Z ∞ Z ∞
2 2
kψ̂k = |ψ̂(ξ)| dξ = |ψ̂(t)|2 dt = kψk2 .
−∞ −∞
Theorem 75. (Inverse wavelet transform) Let ψ be admissible according to

Definition 102. Then for any square integrable f we have
Z ∞Z ∞
1 1
f (t) = Wψ [f ](a, b)ψa,b (t) da db . (10.9)
cψ −∞ −∞ a2
The right hand side in (10.9) is to be understood as the limit for ε → 0,
in the mean quadratic sense, of the improper integrals
Z ∞Z
1 1
fε (t) = Wψ [f ](a, b)ψa,b (t) da db .
cψ −∞ |a|≥ε a2
During the above exposition, as well as previously for the Fourier transform,
three concepts have appeared repeatedly, namely, dilation, translation (time
shift) and modulation (frequency shift). These may be viewed as operators
whose domains and ranges are sets of functions.
Definition 103. (a) (Dilation) For any real a 6= 0, the dilation operator
Da acts on functions f : R → R and yields the function Da f defined by
p
(Da f )(t) = |a|f (at) . (10.10)
(b) (Translation) For any b ∈ R, the translation operator Tb acting on f
as in part (a) is defined as
(Tb f )(t) = f (t − b) . (10.11)
(c) (Modulation) For any c ∈ R, the modulation operator Mc acting on
f as in part (a) is given by
(Mc f )(t) = e2πict f (t) . (10.12)
Remark 80. (a) We observe that if a > 1, then Da f is a compressed version
of f , and if 0 < a < 1, then Da f is a spread-out version of f . If a < 0, Da f
is, in addition, a reflected version of f .
(b) It can be checked from (4.29) that for every a > 0, D
d ˆ
a f (ξ) = (D1/a f )(ξ).
This means that when a function is compressed by the factor a > 1, its Fourier
transform is spread out by the factor 0 < 1/a < 1.
(c) We can check that for every c ∈ R, Md ˆ
c f (ξ) = (Tc f )(ξ). This means that
modulation in the time variable corresponds to translation in the frequency
variable, often referred to as frequency shift.
Wavelets 635
Dilation, translation and modulation have the following properties.
Da Tb f (t) = a1/2 f (at − b) , Da Tb f (t) = Ta−1 b Da f (t) ,

hf, Da gi = hDa−1 f, gi , hf, Tb gi = hT−b f, gi ,
hf, Da Tb gi = hT−b Da−1 f, gi , hDa f, Da gi = hf, gi ,
hTb f, Tb gi = hf, gi , Tb Mc f (t) = e−2πibc Mc Tb f (t) ,
hf, Mc gi = hM−c f, gi , hf, Tb Mc gi = hT−b M−c f, gie2πibc .
It may be seen that the Haar wavelet has only one vanishing moment. It can
be proved that if ψ has large vanishing moments then few wavelet coefficients
hf, ψj,k i will be large.
10.2.2 Multiresolution Analysis

We introduce here a concept called multiresolution analysis (MRA) that
provides a method to construct wavelets. Multiresolution means that we want
to use different degrees of resolution to approximate a given function f , rang-
ing from coarse to fine structures, like a camera zooming in on an object.
These different degrees of resolution are described mathematically by a nested
sequence of vector spaces of functions,
· · · ⊂ V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 ⊂ . . . . (10.13)
The concept of a vector space is treated in linear algebra; a set of functions

forms a vector space V if addition and scalar multiplication of functions from
V never leads to functions which do not belong to V . The inclusions in (10.13)
mean that the information on some level of resolution is also included in all
finer resolutions.
Definition 104. (Multiresolution analysis) A sequence {Vj } of vector spaces
of square integrable functions is called a multiresolution analysis (MRA)
with the scaling function ϕ if the following conditions are satisfied.
(a) The sequence {Vj } is nested,
· · · ⊂ V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 ⊂ . . .
(b) We have ∩j Vj = {0}, that is, the zero function is the only function which
belongs to the intersection of all spaces Vj . In other words, no nontrivial
function belongs to all Vj .
(c) Every
P square integrable function can be expanded in a (finite or infinite)
series j aj gj of functions gj ∈ Vj , where convergence is understood in the
mean quadratic sense. Equivalently, every square integrable function can be
approximated to arbitrary precision by elements of Vj .
(d) f is an element Vj if and only if D2 f (that is, f compressed by a factor

2) is an element of Vj+1 .
(e) The set of translated functions Tk ϕ, for all integer k, forms an orthonormal
basis for V0 . In particular, the scaling function ϕ is an element of V0 and is
orthogonal to all of its translates (Tk ϕ)(t) = ϕ(t − k) for k 6= 0.
From the scaling
√ function ϕ of an MRA one can construct a wavelet ψ. By
(d), the function 2D1/2 ϕ belongs to V−1 . Since V−1 ⊂ V0 due to (a), by (e)
√
we can expand 2D1/2 ϕ in a series,
t √ X
ϕ = 2(D1/2 ϕ)(t) = ak ϕ(t − k) ,
2
k∈Z
2
P
such that k∈Z ak converges. One then can prove (we will not do it here) that
X
ψ(t) = (−1)k a1−k ϕ(2t − k) (10.14)
k∈Z
is a wavelet in the sense of Definition 99.

Example 314. (Haar MRA) Let V0 be the space of all functions which are
square integrable on R and constant on all intervals (k, k + 1) for integer k.
According to (d) of Definition 104, Vj is taken as the space of square integrable
functions which are constant on all intervals of the form [k2−j , (k + 1)2−j ).
For ϕ we choose (
1, 0 ≤ t < 1,
ϕ(t) = χ[0,1) (t) =
0 , otherwise.
One can show that {Vj } is a multiresolution analysis with the scaling function
ϕ. The Haar wavelet 
1 ,
 0 ≤ t < 12 ,
ψ(t) = −1 , 12 ≤ t < 1 ,

0, otherwise

is related to ϕ by ψ(t) = ϕ(2t) − ϕ(2t − 1).

Example 315. (Shannon MRA) Let V0 be the space of all square integrable
functions having bandwidth not exceeding π, that is fˆ(ξ) = 0 whenever
|ξ| > π. According to (c) of Definition 104, Vj is taken as the space of square
integrable functions of bandwidth not exceeding 2j π. The Shannon scaling
function is given by
sin(πt)
ϕ(t) = .
πt
It can be verified that the set of translates Tk ϕ is orthonormal, and due to
the Whittaker-Shannon formula (with l = π) for functions in V0 we have
∞
X sin(π(t − n))
f (t) = f (n) ,
n=−∞
π(t − n)
Wavelets 637
so the translates form an orthonormal basis of V0 . It can be verified that

1 sin(2πt) − cos(πt)
ψ(t) = ϕ(t − ) − 2ϕ(2t − 1) =
2 π(t − 21 )
is the Shannon wavelet.
The Shannon wavelet family is given by
ψj,k (t) = 2j/2 ψ(2j t − k)
2j/2
= (sin(2π(2j t − k)) − cos(π(2j t − k)) .
π(t − 12 )
Moreover, it can be checked that ψ satisfies
ψ̂(ξ) = e−iξ/2 χA (ξ) ,
where A consists of all ξ ∈ [−2π, −π] together with all ξ ∈ [π, 2π]. That is,
on each of these intervals ψ̂(ξ) = −e−iξ/2 , and for ξ outside of these intervals,
ψ̂(ξ) = 0.
2
Example 316. Show that ψ(x) = (1 − x2 )e−x /2
is a wavelet.
Solution: It is Mexican hat basic wavelet, which is one of the oldest wavelets.
−x2
Because of its fast decaying e 2 factor, it vanishes to zero very fast. However,
it does not vanish to identically equal to zero beyond a finite interval. Thus
it does not have a (compact) support, but we may say that it is essentially
with compact support as shown in Figure 10.7. It is easy to see that this
"
"
"
"
...
"
... .. •
FIGURE 10.7: The Mexican Hat Wavelet ψ(t)
wavelet satisfies the basic condition for any wavelet, namely, that its integral
over (−∞, ∞) vanishes. This is possible from the following expression:
x2 d2 − x2
(1 − x2 )e− 2 =− e 2,
dx2
which can be shown easily,
d2 x2 d −2x x2

− 2 e− 2 = − e 2
dx dx 2
2x − x2 x2

− 2
= x(− )e 2 +e
2
x2
= (1 − x2 )e− 2 .
With this result we have

Z ∞ ∞ x2
Z
ψ(x)dx = (1 − x2 )e− 2 dx
−∞ −∞
∞
d2 − x2
Z
= − e 2 dx
−∞ dx2
∞
d − x2
= − e 2
dx x=−∞
= 0−0=0
after using L’Hopital’s rule for the indeterminate form

x 1
lim = lim x2 /2 = 0.
x→±∞ ex2 /2 xe
The Mexican hat wavelet did not prove useful for the discrete wavelet series
analysis. However, its simple expression makes it a good example for illustrat-
ing the continuous wavelet transform, its inverse, and other properties.
Example 317. (Shannon wavelet) We present Shannon scaling function
sin πt
φ(t) = = sincπt, − ∞ < t < ∞,
πt
and its associated basic wavelets
1 1

sin π t − 2 − sin 2π t − 2
ψ(t) = (10.15)
π t − 12

as shown in Figures 10.8 and 10.9, respectively.
These wavelets are continuous. However, they are infinitely many times
differentiable. Also, in a certain sense they are better localized than Haar
ones. However, even though they tend to zero as t → ±∞, they nevertheless
do not die out to identically zero beyond a finite interval. Hence, they are
denied a compact support.
Wavelets 639
OB
0.6
0.4
02
-02
-0.4
-OB
-OB
-1
- 10 -6 -1! _. -2 10
FIGURE 10.8: Shannon Scaling Function
08
0.6
- 0.4
-0.6
-0.8
~,·~ ....~----~.----_.~---~~--~----~--~----~--~----~.0
o----
FIGURE 10.9: Shannon Wavelet Function
10.3 Applications of Wavelets

10.3.1 Applications of Wavelets to Biometrics
National Security - Automated methods capable of rapidly determining an
individual’s true identity, previously used identities and past activities.
Homeland Security and Law Enforcement - Technologies to secure the
U.S. and while facilitating legitimate trade and movement of people and iden-
tifying criminals in the civilian law enforcement environment.
Enterprise and E-government Services - Administration of people, pro-

cesses and technologies.
Personal Information and Business Transactions - Business plans that
meet customer demands for service at any time, from any location and through
multiple communication devices.The term “biometric” is derived from the
Greek words “bios” (life) and “metron” (measurement); biometric identifiers
are measurements from living human bodies.
Biometrics are automated methods of recognizing a person based on physi-
ological or behavioral characteristic called biometric identifiers, traits or char-
acteristics for automatically recognizing individuals.
Biometric technologies are becoming the foundations of an extensive array
of highly secure identification and personal verification solutions.
Finger Print Compression
Most of the fingerprint recognition systems, especially those in commercial
applications, do not store fingerprint images and only store numerical features
extracted from the images.
However, in certain applications (e.g., law enforcement), it may be neces-
sary to store the fingerprint images acquired during enrollment in a database
so that a trained expert can verify the matching results output by an Auto-
mated Fingerprint Identification System (AFIS).
Storing millions of fingerprint images (as in a large AFIS), or transmitting
these images through low-bandwidth networks is particularly demanding in
terms of space and time.
Hence, several ad hoc compression techniques have been proposed and one
of them, known as wavelet scalar quantization (WSQ), has been adopted as a
standard by the FBI in the United States.
FIGURE 10.10
Figures 10.12 through 10.14 indicate how wavelets are used in image com-
pression.
10.3.2 CAT Scan

A computerized axial tomography (CAT or CT) scan is generated from a
set of thousands of X-ray beams, consisting of 160 or more beams at each of
180 directions. We will explain here just one beam in order to comprehend
this large collection of X-rays. When a single X-ray beam of known intensity
Wavelets 641
Common Biometric Modalities:

.,_ Fingerprint
._ Face
Other modalities:
.,_ Iris
.,. Gait
._ Voice
.. vascular
.,_ Signature
.,. Retina
.,_ Hand geometry
.,. Facial Thermography
FIGURE 10.11
0011010110110
0101110100010
1010101011110
0101011101101
1101101100101
Biometric Capture & Feature Template Storage
Presentation Preprocessing Extraction Creation
FIGURE 10.12: Biometric Compression System
FIGURE 10.13: WSQ- Endcoder

Relllinod CAOI'jJ) " GfA!Oi ~- Zero~ ~1 .7 1 ~·

vorp esae d nrage
Threshold: 3.5
1(11)
Zeros: 42%
Retained energy:
1M
99.95%
2(oo)
)~•)
FIGURE 10.14: Results
passes through a medium, such as muscle or brain tissue, some of the energy in
the beam is absorbed by the medium and some passes through. The intensity
of the beam as it emerges from the medium can be measured by a detector.
The initial and final intensities tells us about the ability of the medium to
absorb energy.
For the sake of clarity we make some assumptions that present an idealized
view of what an X-ray is and how it behaves. Assume X-ray beam composed
of photons and is monochromatic. Each photon has the same energy level of
E and the beam propagates at a constant frequency, with the same number
of photons per second passing through every centimeter of the path of the
beam. Let us denote N (x) the number of photons per second passing through
a point x. Then the intensity of the beam at the point x is
I(x) = E.N (x).
Every substance through which an X-ray passes has the property that each
millimeter of the substance absorbs a certain proportion of the photons that
pass through it. This proportion which is specific to the substance, is called the
attenuation coefficient of the material. A Hounsfield unit of the medium is
denoted by H medium and defined by
A denotes the true attenuation coefficient Beer’s law:
dI
= −A(x)I(x)
dx
dI
= −A(x)dx.
I
If the beam starts at a location with initial intensity I and is detected, after
passing through the medium, at the location x, with final intensity, then we
get Z x1 Z x1
dI
=− A(x)dx.
x0 I x0
Thus Z x1
ln[I(x1 )] − ln[I(x0 )] = − A(x)dx.
x0
Wavelets 643
Here we know the initial and final values of intensity I(x), and A(x) providing
special property of medium being sampled by the X-ray is unknown.
Thus from the measured intensity of X-ray we are able to determine not the
values of A itself, but the values of the integral of A along the line of X-ray.
What we can measure: we can design an X-ray emission/detection machine
that can measure the values of I(x). Hence we can compute from the equation
that is integral of the (unknown) attenuation coefficient function along the
path of X-ray.
What we want to know: The value of A(x) at each location depends on
the nature of matter located at the point x. We wish to know A(x).
Tomography is from a Greek word meaning slice. In this field we want to
find internal structure of a non-transparent object by sending signals like
electromagnetic waves of different frequencies and acoustic waves.
The current excitement in tomographic imaging originated with Hounsfield
and Cormak’s invention of X-ray computed tomography for which they jointly
received 1979 Noble prize for physiology and medicine.
Radon transform invented by the Austrian mathematician Johann
Radon is the integral transform consisting of a scalar valued function over
the straight line. By computerized (computed) tomography (CT) we mean
the reconstruction of a function from its line or plane integrals. Essentially it
amounts to inverting the Radon transform. Wavelet methods have been used
in the studies of Radon Transform [42].
10.3.3 Seismic Tomography

Seismic tomography uses mathematical modeling of P and S wave travel
times to map velocity perturbations in the interior of the earth. The primary
energy source used in global seismic tomography is seismic waves generated by
earthquakes which pass through the Earth in all directions, and are recorded
on seismograms around the world. Inversion of arrival time data is used to de-
termine the speed of the waves at any given points. Use of seismic tomography
to interpret the internal structure of the earth is similar in technique to a CAT
scan. Computer assisted tomography (CAT) uses X-rays transmitted through
the body in many different directions. A mathematical method is then applied
to explain the loss in intensity of the X-rays due to the varying absorptive
properties of different parts of the body. CAT scans and seismic tomography
differ because X-rays travel in straight paths, whereas the ray paths of sound
waves bend with changes in the velocity structure of the medium.
Seismic tomography has several applications in exploration and global
geophysics. Seismic tomography can also be used to characterize fractured
bedrock, map groundwater reservoirs, and locate ore bodies. Global seismic
tomography is used to interpret ancient subducted slabs, locate the sources of
hotspots, and model convection patterns in the mantle.
Global seismic tomography is limited by the irregularities in time and

space of the source, and by the incomplete coverage of recording stations.
The primary sources earthquakes, which are impossible to predict and only
occur at certain locations around the world. In addition, the global coverage
of recording stations is limited due to economic and political reasons. Because
of these limitations, seismologists must work with data that contains crucial
gaps. Experimental data cannot accurately replicate conditions deep in the
earth’s interior, making comparisons with real world data difficult. Another
limitation in imaging deep structures is attenuation and absorption of energy
due to the long distances waves travel, which reduces the resolution which can
be attained. Due to the problem of attenuation, the minimum sizes of features
in the mantle which can be resolved are blocks 100 to 200 km on each side.
In short, seismic tomography is a methodology for analyzing and com-
puting earth properties. Seismologists treat seismic tomography as a part of
seismic imaging, where they are mainly concerned with estimating properties
such as propagating velocities of P -waves and S-waves. The seismic tomogra-
phy may be thought as the derivation of three dimensional velocity structures
of earth from seismic waves [43], [44] and [45]. Estimation of P -wave velocity
is the simplest example of seismic tomography and seismic tomography is for-
mulated as an inverse problem. See references
Main Ideas
Several types of faults occur in the earth’s crust.
The faults break due to accumulated stress along the routes. The sudden
release of energy is called an earthquake. The energy is released as seismic
waves that travel away from the earthquake location. Two major types of
waves produced are: body waves and surface waves.
The waves can be measured by a seismometer instrument. The timing
and amplitude of the seismic waves can be used to determine the location
and magnitude of the earthquake. Earthquakes commonly occur along plate
boundaries.
These waves also provide information on the structure of the earth. A clear
layering is visible.
Two very recent papers give current trends application of wavelet to seis-
mology.
There is a general discussion how to use wavelet and fractal method in
prediction of earthquakes. This is a challenging problem but scientists are
optimistic.
10.3.4 Variants of Wavelets in Medical Imaging

PROPERTIES OF SHEARLETS
This section summarizes an important paper about shearlets published in
the Journal of Medical Imaging and Health Informatics in 2016 [48]. Wavelets
have several properties that make them useful in various disciplines such as
Wavelets 645
medicine, electricity generation, and petroleum chemistry. For example, shear-

lets:
• Are well localized.

• Exhibit high levels of directional sensitivity.
• Satisfy the principles of parabolic scaling.
• Are spatially localized.
• Are optimally sparse.
Shearlets and Curvelets in Image Processing
Digital signal and image processing is an important technique for analyzing,
manipulating and processing real world data and images. The technology can
generate time series, collect data and calculate values. It can produce audio
signals, video images, seismic activity records, rainfall data and biomedical
test results.
Edges are prominent features of images. Analyzing and detecting edges
are important aspects of image processing even though they represent low-
level tasks in applications such as three-dimensional reconstruction, shape
recognition, and image compression, enhancement and restoration.
Shearlets and curvelets represent a novel directional multiscale mathemati-
cal framework that is well adapted for identification and analysis of distributed
discontinuities such as edges in natural images. Despite their value, wavelets
have limited ability to deal with directional information.
The shearlet and curvelet [38] approach is designed to deal with directional
and anisotropic features typically present in natural images, and have the
ability to accurately and efficiently capture the geometric information of edges.
As a result, the shearlet framework provides effective algorithms for de-
tecting both the locations and orientations of edges, and for extracting and
classifying basic edge features such as corners and junctions.
The shearlet framework provides a unique combination of mathematical
rigidity and computational efficiency when addressing edges.
Continuous Shearlet Transform (Figures 10.15 and 10.16)
For ψ ∈ L2 (R2 ), the continuous shearlet system SH(ψ) is defined by
SH(ψ) = {ψa,s,t = Tt DAa DSs ψ : a > 0, s ∈ R, t ∈ R2 }.

a 0
Aa = ,
0 a1/2
and shearing operator DSs , s ∈ R, where the shearing matrix Ss is given by

1 s
Ss =
0 1
(a) Suppon of the Fourier transfonn of a (b) Fourier domain suppon of several el-
classical shearlet. enJenls of the shearlet system, for differ-
ent values of a and s.
FIGURE 10.15
FIGURE 10.16: MRI of Brain
and Tt is the translation operator on L2 (Rd ), defined by
Tt ψ(x) = ψ(x − t), for t ∈ Rd .
Shearlet in Magnetic Resonance Image (MRI) of Brain

Image denoising is the process of recovering an original image from an image
corrupted with noises such as Gaussian, speckle and other types. Shearlets can
be used effectively for image denoising by using various shrinkage rules. The
main steps of image denoising are:
1. Compute shearlet transform of the noisy image.
2. Apply hard and soft threshold to the obtained shearlet coefficients.
3. The thresholded shearlet coefficients are subjected to reconstruction to
recover the original image.
Reference [48] presents the results obtained from the experimentation. The
proposed approach of image compression was explored with MRI image
Wavelets 647
FIGURE 10.17: Wavelet Analysis Data Applied to Power System
datasets and the results evaluated with compression ratio, PSNR, average
difference, cross correlation and normalized absolute error calculations.
10.3.5 Applications in Power Systems (Figure 10.17)

Wavelets were first applied to power system in 1994 by Robertson et al [49]
and Ribeiro [50]. Since then number and scope of the applications have in-
creased. The most popular wavelet analysis applications include power quality
and measurement, partial discharges, load and revenue forecasting, protection,
and transient interruptions. (www.intechopen.com).
Power Quality (Figure 10.17)

Several studies [50] focused on detecting power system disturbances using the
wavelet transform (WT) to analyze sags, swells, and other slowly changing
anomalies of non-stationary signals. The technique covers spectral contents at
low frequencies. Examining WT coefficients at high decomposition levels can
help predict onsets of disturbance events. Application of Wavelet Theory in
Engineering Physics and Technology, Published in Tech, 2012,
http://www.intechopen.com/books/advances-in-wavelet-theory-and-their-ap-
plications-in-engineering-physics-and-technology/ application-of-wavelet-anal-
ysis-in-power-systems.
Wavelets in Oil Industry (Figure 10.18)
Siddiqi [38] studied the use of wavelet methods to identify and characterize
oil reservoirs. The petroleum product market is completely dependent on ex-
ploration, drilling, and production costs. Researchers in the oil industry focus
on developing techniques to minimize exploration and production costs and
frequently apply wavelet methods to achieve their goals. Siddiqis article dis-
cusses use of the wavelet-based solution of the Buckley-Leverett equation for
reservoir modeling. The technique allows researchers to perform virtual anal-
••• j••'
... .... ····:·· 7!XIl
·-.
7tm .... .... ·····
•• ; •••• 0
':·· . 600')
600')
.... ·····:·· ..... ·: .
..
.... ...
~·
Sill)
.... ..... :····· : ·· .

·: .. 40ll
:nxJ
1!XIl
200)
1000
7120
Wave-Length, ft Depth, ft
FIGURE 10.18: 3 − D Morlet Scalogram for GR, SDGM 256
yses to determine locations, quantities, and other parameters of oil and gas
reserves worldwide.
10.4 Exercises

1, 0 <≤ t < 1/2
10.1 Let ψ(t) = −1, 1/2 ≤ t < 1
0, otherwise


1, 0 <≤ t < 1
and φ(t) =
0, otherwise.
ψ(t) is known as the Haar mother wavelet and φ(t) is the called Haar
scaling function or Haar father wavelet.
(a) Write the support of ψ and φ. Draw the graphs of
(b) ψ(4t − 3), ψ(4t + 3), ψ(23 t)
(c) φ(2t), φ(2t − 2), φ(23 t) on intervals of your choice.
Wavelets 649
10.2 Show that for ψ and φ of Exercise 10.1 the following relation holds:
ψ(t) = φ(2t) − φ(2(t − 1/2)).
Is this relation true for any mother and father wavelets?

10.3 Show that the Haar mother wavelet ψ and the Haar father wavelet φ
are orthogonal over (−∞, ∞), that is,
R −∞
(a) ∞ φ(t)ψ(t)dt = 0
R −∞
(b) Verify that ∞ ψ 2 (t)dt = 1
R −∞
(c) Verify that ∞ φ2 (t)dt = 1.
10.4 Show that for above φ and ψ the following results hold:

R −∞ 0, k 6= 0
(a) ∞ φ(t)φ(t − k)dt = .
1, k = 0

R −∞ 0, k 6= 0
(b) ∞ ψ(t)ψ(t − k)dt = R∞ 2 .
−∞
ψ (t)dt = 1, k =0
10.5 For the Haar scaling function (Haar father wavelet) φ show whether or
not the following translated sets re orthogonal on (−∞, ∞):
(a) {φ(2t − k)}, k ∈ Z and (b) {ψ(2t − k)}, k ∈ Z.
10.6 Examine the question of Exercise 10.5 for the Haar mother wavelet.
10.7 Introduce the concept of energy of the signal and discuss its relationship
with the wavelet transform of the signal.

Wavelet analysis is an emerging field. We recommend that reader choose
among the several references in the bibliography, particularly those covering
applications of wavelet analysis.
Bibliography
[1] P. S. Addison, The Illustrated Wavelet Transform Handbook, Institute

of Physics Publishing, 2002.
[2] A. Aldroubi and M.Unser, Wavelets in Medical Biology, CRC Press,

1996.
[3] H. Anton, Calculus: A New Horizon, Sixth Edition, Wiley, New York,
1999.
[4] A. Boggess and F. T. Narcowich, A First Course: Wavelets with Fourier

Analysis, Prentice Hall, 2001.
[5] S. A. Broughton and K. Bryan, Discrete Fourier Analysis and Wavelets:
Application to Signal and Image Processing, Wiley, 2009.
[6] M. Cartwright, Fourier Methods for Mathematicians, Scientists and En-

gineers, Ellis Horwood, New York, 1990.
[7] T. F. Chan and J. Shen, Image Processing and Analysis: Variational,
PDE, Wavelets and Stochastics Methods, SIAM, 2005.
[8] S. J. Chapman, MATLAB Programming for Engineers, Second Edition,
Brooks/ Cole, 2000.
[9] O. Christensen, An Introduction to Frames and Riesz Bases,
Birkhäuser, Boston, 2003.
[10] O. Christensen, Function Spaces and Expansions: Mathematical Tools
in Physics and Engineering, Birkhäuser, Boston, 2003.
[11] O. Christensen, K. L. Christensen, Approximation Theory from Taylor
Polynomials to Wavelets, Birkhauser, Boston, 2004.
[12] K. M. O. Connor, Calculus: Labs for MATLAB, Jones and Bartlett,
2005.
[13] J. M. Cooper, A MATLAB Companion for Multivariable Calculus, Aca-

demic Press, 2001.
[14] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992.
651
652 Bibliography
[15] J. H. Davis, Methods of Applied Mathematics with MATLAB Overview,

Birkhäuser, 2004.
[16] K. Delvin, An Electronic Companion to Calculus, Cogito Learning Me-

dia, New York, 1997.
[17] W. Drongelen, Signal Processing for Neuroscientists:An Introduction to
the Analysis of Physiology Signals, Elsevier, 2007.
[18] K. M. Furati , M. Z. Nashed, A. H. Siddiqi (eds.), Mathematical Models

and Methods for real world systems, Chapman & Hall/CRC, Taylor and
Francis, 2006.
[19] R. Gencay, F. Seluk and B. Whitcher, Introduction to Wavelet and Fil-
tering Methods in Finance and Economics, Academic Press, 2002.
[20] R. C. Gonzales and R. E. Wood, Digital Image Processing, Addison-

Wesley Reading, MA, 1993.
[21] B. B. Hubbard, The World According to Wavelets, A. K. Peters Natick,
Second Edition, MA, 1998.
[22] A. Jerri, Introduction to Wavelets, Sampling Publishing, 2011.

[23] G. Kaiser, A Friendly Guide to Wavelets, Birkhäuser, 1994.
[24] M. Kobyashi, Wavelets and their Applications, SIAM, 1998.
[25] A. K. Louis, P. Maass and A. Reider, Wavelet Theory and Applications,
Wiley, 1997.
[26] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, New
York, 1998.
[27] Y. Meyer, Wavelets: Algorithms and applications, SIAM, Philadelphia,
1993.
[28] Y. Meyer, Lecture Notes on Mathematical Problems in Image Process-
ing, 4-22 September, 2000 Abdus Salam Centre of Theoretical Physics,
Trieste, Italy, 2000.
[29] P. V. O’Neil, Advanced Engineering Mathematics, Fifth Edition,
Brooks/Cole, 2003.
[30] D. B. Percival and A. T Walden, Wavelet Methods for Time Series
Analysis, Cambridge University Press, 2000.
[31] H. L. Resnikoff and R. O Wells Jr, Wavelet Analysis, Springer, 1998.
[32] R. H. Shumway and D. S Stoffer, Time Series Analysis and Its Appli-
cations, Springer, 2003.
Bibliography 653
[33] A. H. Siddiqi, Applied Functional Analysis, Marcel Dekker, 2004.
[34] A. H. Siddiqi, G. Korvin, et al. (Eds.), Theme Issue on Wavelet & Frac-
tals in Science and Engineering, Arab. J. Sci. Eng., 28–29, 2003–2004.
[35] A.H. Siddiqi, I. S. Duff and O. Christensen (Eds.), Modern Mathematical
Methods and Algorithms for Real Word Systems, Anamaya and Anshan,
2006.
[36] A. H Siddiqi, A. K. Gupta and M. Brokate, Modelling of Engineering

and Technological Problems, Americans Institute of Physics, 2009.
[37] A. H. Siddiqi, R. C. Singh, P. Manchanda (Eds.) Mathematics in Science
and Technology, World Scientific, Singapore, 2011.
[38] A. H. Siddiqi, Emerging Applications of Wavelet Methods, American

Institute of Physics, Melville, NY, 2012.
[39] A. H. Siddiqi, Functional Analysis with Applications (Chapter 12, 13),
Springer Nature, 2017.
[40] A. H. Siddiqi, Keynote Address, International Workshop on Wavelet
and Applications, September 21-24, 2016, Istanbul, Turkey.
[41] A. Garg and A.H. Siddiqi, Inverse estimation of 1-D heat problem with
Neumann boundary condition by Morozov discrepancy principle, Ind.
J. Industr. Appl. Math., 7:58-64, 2016.
[42] M. Bottema, B. Moran and S. Suvorova, An application of wavelets in

tomography, Digital Sig. Proc., 8: 244-254, 1998.
[43] A. Iske and T. Randen, Mathematical Methods and Modeling in Hydro-
carbon Exploration and Production, 16, pp.267-297, Springer, New York,
2006.
[44] G. Nolet, A Breviary of Seismic Tomography, Cambridge university

Press, Cambridge, 2008.
[45] C. R. Vogel, Computational Methods for Inverse Problem Frontiers in
Applied Mathematics, SIAM, 2002.
[46] S. Mostafa Mausavi and Charles A. Langston, Hybrid Seismic Denoising

Using Higher-Order Statistics and Improved Wavelet Block Threshold-
ing, Bulletin of the Seismological Society of America, Vol. 106, No. 4,
2016.
[47] S. Mostafa Mausavi and Charles A. Langston and Stefan P. Horton,
Automatic mircoseismic denoising and onset detection using the syn-
chosqueezed continuous wavelet transform, Geophysics, Vo. (81), No. 4,
2016.
654 Bibliography
[48] R. Aneja and A. H. Siddiqi, A Hybrid Shearlet based compression co-

efficients and ROI Detection, Journal of Medical Imaging and Health
Informatics, ISSN: 2156-7018, pp. 506-517(12), 2016, American Scien-
tific Publishers
[49] D. C. Roberston, O. Camps and I. Meyer, Wavelets and Power System
Transients Feature Detection and Classification, Proceddings of SPIE,
2242: 474-487, 1994.
[50] P. F. Ribeiro, Wavelet Transform an advanced Tool for Analyzing non-

stationary Harmonic Distortion in Power System, Proceeding IEEE,
Transcation of Power System, Bologna, Italy, September 21-24, 1994.
Webpages for MATLAB and Wavelets
http://www.math.poly.edu/courses/matlab.html
http://faculty.olin.edu/bstorey/notes/Fourier.pdf
http://www.mathworks.com/products/signal
http://www.bearcave.com/misl/misl tech/
http://oceanz.tamu.edu/˜baum/wavelets.html
http://www-stat.stanford.edu/˜wavelab/Wavelab 850/AboutWaveLab.pdf
http://www-stat.stanford.edu/˜donoho/Reports/1995/wavelab.pdf
Chapter 11
Miscellaneous Topics Used for
Engineering Problems
11.1 Fractals in Engineering Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656

11.1.1 Fractals and Interaction with Wavelets . . . . . . . . . . . . . . . . . 656
11.1.2 Fractal Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665
11.1.3 Differential Equations on Fractals . . . . . . . . . . . . . . . . . . . . . . . 670
11.1.4 Chaos and Fractals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
11.2 Introduction to Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
11.2.1 Examples of Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
11.2.2 Wavelets and Fractals in Time Series Analysis . . . . . . . . . 676
11.2.3 Prediction of Time Series Behavior Using Wavelets and
Fractals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
11.2.4 Fractal Dimension and Predictability . . . . . . . . . . . . . . . . . . . 682
11.3 Introduction to Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
11.4 Introduction to Fuzzy and Neuro-fuzzy . . . . . . . . . . . . . . . . . . . . . . . . . . 692
11.5 Software for Time Series, Neural Network, Neuro-fuzzy and
Fuzzy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
11.6 Introduction to Graph Theory with Applications . . . . . . . . . . . . . . . 700
11.7 Applications of Spline Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
11.7.1 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
11.7.2 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
11.8 Compression Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
11.9 Applications of Lozi Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
11.9.1 Lozi Mappings and Secure Communications . . . . . . . . . . . . 718
11.10 Introduction to Maxwell Equations with Applications . . . . . . . . . . 719
11.11 Stochastic Calculus for Engineering Problems . . . . . . . . . . . . . . . . . . 721
11.11.1 Stochastic Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
11.11.2 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 723
11.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
11.13 Suggestion for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
655
11.1 Fractals in Engineering Science

A fractal is a never-ending pattern. Fractals are infinitely complex patterns
that are similar across different scales. They are created by repeating a simple
process over and over in an ongoing feedback loop.
Benoit Mandelbrot (1924-2010) first used the word fractals to describe
a set of models that accurately depict natural geometry. Mandelbrot then
challenged the Euclidian geometry that relies on primitives such as spheres,
cuboids and cones that is not sufficient to accurately represent complex natu-
rally occurring objects. He put forward the geometry of fractals that contained
infinite details that can accurately model objects such as trees, mountains, and
clouds.
By 1980 Mandelbrot accomplished his goal while working at Harvard Uni-
versity, and started working on fractal images and exploring the self-similarity
characteristic of all fractals. He also worked closely with IBM in its fractal
project and added to an incredible visual field of mathematics. Over nearly
seven decades, working with dozens of scientists, he contributed to the fields
of geology, medicine, cosmology and engineering. He used the geometry of
fractals to explain how galaxies cluster, how wheat prices change over time
and how mammalian brains fold as they grow, among other phenomena.
11.1.1 Fractals and Interaction with Wavelets

Sharp signal transitions create large amplitude wavelets coefficients. Singu-
larities are detected by following across scales the local maxima of the wavelet
transform.
Fractals describe objects that are too irregular to fit into traditional geo-
metrical settings. Fractals occur as graphs of functions. Indeed various phe-
nomena display fractal features when plotted as functions of time. Examples
include atmospheric pressure, oil and natural gas reservoir characteristics and
stock market prices, usually when recorded over fairly long time spans. The
zooming capability of the wavelet transform not only locates isolated singular
events, but can also characterize more complex multi-fractal signals having
non-isolated singularities. Multifractals are fractal objects which cannot be
completely described using a single fractal (monofractals) dimension. They
have in fact an infinite number of dimension measures.
The wavelet transform takes advantage of multifractal self-similarities to
compute the distribution of their singularities. This singularity spectrum is
used to analyze multifractal properties. Signals that are singular at almost
every point are multi-fractals and they appear in the maintenance of economic
records, physiological data including heart records, electromagnetic fluctua-
tions in galactic radiation noise, textures in images of natural terrain and
variations of traffic flow.
Miscellaneous Topics Used for Engineering Problems 657
Hausdorff Measure
Let U be any non-empty subset of n-dimensional Euclidean space, Rn ; the
diameter of U is denoted by |U | and defined as
|U | = sup{|x − y| : x, y ∈ U },
that is, the greatest distance apart of any pair of points in U . If {Ui } is a
countable (or finite) collection of sets of diameter almost δ that cover F , that
∞
is F ⊂ U Ui with 0 < |Ui | ≤ δ for each I, we say that {Ui } as a δ-cover of F .
i=1
Suppose that F is a subsetP of Rn and s is a non-negative Pnumber. For any
∞ s ∞ s
δ > 0 we define Hδ (F ) = inf{ i=1 |Ui | : {Ui }Hδs (F ) = inf{ i=1 |Ui | : {Ui }
s
is a δ-cover of F .
Thus, one looks at all covers of F by sets of diameter at most δ and seek
to minimize the sum of the sth powers of the diameter. As δ decreases, the
class of permissible covers of F is reduced. Therefore, the infinimum Hδs (F )
increases, and so approaches a limit as δ → 0. We write,
Hδs (F ) = lim Hδs (F ).

δ→0
This limit exists for any subset F of Rn , though the limit value can be 0 or ∞.
Hδs (F ) is called the s-dimensional Hausdorff measure of F . The Hausdorff
dimension of F , denoted by dimH F , is defined as
dim H F = inf{s : H s (F ) = 0}
= sup{s : H s (F ) = ∞}.
If s = dimH F, then H S (F ) may be zero at infinity or may satisfy
0 < H s (F ) < ∞.
Dimension of the middle third Cantor set is s, if s = log 2

log 3 = 0.6309. In general,
the Hausdorff dimension of the middle Cantor set lies between 0.5 and 1.
singularity spectrum
Definition 105. (Self similar set) A set S ⊂ Rn is said to be a self-similar if it
is the union of disjoint subsets S1 , S2 , . . . , SR that can be obtained from S with
scaling, translation and rotation. It may be observed that the self-similarity
often implies an infinite multiplication of details, which creates irregular struc-
tures.
The triadic Cantor set and the Von Koch curve are well known examples.
Definition 106. (Spectrum) Let Sα be the set of all points t ∈ R, where the
pointwise Lipschitz regularity of f is equal to α. The spectrum of singularity,
D(α), of f is the fractal dimension of Sα . The support of D(α) is the set of α
such that S α is not empty. The singularity spectrum gives the proportion of
Lipschitz singularities that appear at any scale a. Fractal dimension is a first
order parameter of complexity which can degenerate. Very different structures
share the same fractal dimension. Therefore, Nekka and Li [34], [35] studied
the concept of the Hausdorff measure spectrum functions (HMSFs).
They introduce HMSF as a new way to distinguish sets having the same
fractal dimension. HMSF is based on a Hausdorff measure of the translation
of the set through itself in a continuous manner. Translation is made contin-
uously on each point (local) and the Hausdorff measure (global) is estimated
implying HMSF extracts the whole information set. The indicator function of
the intersection of a set with its translate can be viewed as a two-point joint
moment (autocovariance). This explains in a way why HMSF completes the
information obtained from pointwise descriptors like fractal dimension. From
the definition of capacity dimension as a special form of Hausdorff dimension,
it follows that if we make a disjoint cover of the support of f with intervals of
size s then the number of intervals that intersect is:
Na (s) ≈ s−D(α) .
A multifractal f is called homogeneous if all singularities have the same

Lipschitz exponent which means the support of D(α) is restricted to α0 . Frac-
tional Brownian motions [26] are examples of homogeneous multifractals.
Definition 107. (Fractional Brownian motion)
A fractional Brownian motion of Hurst exponent 0 < H < 1 is a zero-mean
Gaussian process BH such that
BH (0) = 0
2
E{|BH (t) − BH (t − ∆)| } = σ 2 |∆|2H .
Partition Function
One cannot compute the Lipschitz regularity of a multifractal because its
singularities are not isolated and finite numerical resolution is not sufficient
to discriminate them. To overcome this difficulty Areodo, Bacry and Muzy [56]
introduced the concept of wavelet transform modulus maximum using a global
partition function. Let ψ be a wavelet with n vanishing movements. Mallat
[57] states that if f has pointwise Lipschitz regularity α0 ≤ nat v then the
wavelet transform has a sequence of modulus maxima that converges toward
v at fine scales. The set of maxima at the scale a can thus be interpreted as a
cover of the support of f with wavelets of scale a. At these maxima locations
1
|Tψ f (a, b)| ≈ aα0 + 2 .
Let {Up (a)}p∈Z be the position of all local maxima of Tψ g(a, b) at a fixed
scale a. The partition function Z measures the sum at a power q of all these
wavelet modulus maxima:

X q
Z(q, a) = |Tψ f (a, up )| .
p
For each q ∈ R, the scaling τ (q) measures the asymptotic decay of Z(q, a) at
fine scales:
log Z(q, a)
τ (q) = lim .
a→0 log a
This typically means that
Z(q, a) ≈ aτ (q) .
Theorem 76. (Areodo, Bacry, Jaffard, Muzy [56]) Let ∧ = [αmin , αmax ] be
the support of D(α). Let ψ be a wavelet with vanishing moments. If f is a
self-similar signal then
τ (q) = min (q(α + 1/2) − D(α)).

α ∈∧
Theorem 77. The scaling exponent τ (q) is a convex and increasing function
of q. The Legendre transform is invertible if and only if D(α) is convex, in
which case
D(α) = min(q(α + 1/2) − τ (q)).
q∈R
The spectrum D(α) of self-similar signals is convex. For detail properties

of singularities spectrum we refer
P to Meyer [28]q and [29].
We first calculate z(q, a) = p |Tψ f (a, up )| , then derive the decay scaling
exponent τ (q) and finally compute D(α) with a Legendre transform. If q <0
then the value of Z(q, a) depends mostly on the small amplitude
max |Tψ f (up , a)| .
Procedure for Numerical Calculation

1. Maxima: compute Tψ f (up , a) and the modulus maxima at each scale a.
Chain the wavelet maxima across scales.
2. Partition: compute:
X q
Z(q, a) = |Tψ f (a, up )| .
p
3. Scaling: compute τ (q) with a linear regression of Z(a, q) as a function

of log2 (a)
log2 (a)Z(q, a) ≈ τ (q) log2 a + C(q).
4. Spectrum: compute:
D(α) = min(q(α + 1/2) − τ (q)).

q∈R
Smooth Perturbations
Let f be a multi-fractal whose spectrum of singularity D(α) is calculated from
τ (q). If a regular signal is added to f then the singularities are not modified
∼
and the singularity spectrum of f = f + g remains unchanged. We study the
effect of this smooth perturbation on the spectrum calculation.
∼
The wavelet transform of f is T f˜(u, s) = T f (u, s) + T g(u, s).
∼
Let τ (q) and τ (q) be the scaling exponent of the partition functions Z(q, s)
and calculated from the modulus maxima respectively .
Theorem 78 (Arneodo, Bacry, Muzy). Let ψ be a wavelet with exactly n
vanishing moments. Suppose that f is a self-similar function.
∼
1. If g is a polynomial of degree p < n then τ (q) = τ (q) for all q ∈ <.
2. if g (n) is almost every where non-zero then
q ≥ qc
∼ τ (q)
τ (q) =
(n + 12 )
q ≤ qc
where qc is defined by
1
τ (qc ) = (n + )qc .
2
Hurst Exponent and Application
There are many processes that have random (stochastic) components and also
exhibit some predictability between one element and the next. In statistics,
this is sometimes described by the autocorrelation function (the correlation
of data set with shifted version of the data set). The autocorrelation is one
measure of whether a past value can be used to predict a future value.
A random process that has some degree of autocorrelation is referred to as
long memory process (or long range dependence). River flow exhibits this kind
of long-term dependence. A hydrologist, named Hurst, studied Nile River flows
and reservoir modeling [27 and website]. The Hurst exponent, to be estimated
in the sequel using wavelet method having various kinds of applications, is
named for him. In the recent past, applications of the Hurst exponent have
attracted attention of researchers working in different fields.
The Hurst exponent is also directly related to the fractal dimension, which
gives a measure of the roughness of a surface. The fractal dimensions and
their refinements have been used in diverse fields. The relationship between
the Hausdorff fractal dimension D and the Hurst exponent H is D = 2 − H.
Correlations
Let us recall concept of cross correlation and auto-correlation which are closely
related to Hurst exponents and analysis of real world systems.
Cross Correlation
The cross-correlation coefficient r, which is a measure of linear association

between two variables, is defined as
Pn
(Xi − X̄)(Yi − Ȳ )
r = s i=1 .
Pn n
P
(Xi − X̄)2 (Yi − Ȳ )
i=1 i=1
A positive value of the coefficient r indicates that as one value increases,

the other tends to increase whereas a negative value indicates that as one
variable increases the other tends to decrease.
Auto-Correlation
We say that a data set exhibits auto-correlation if value xi at time ti is cor-
related with a value Xi+d at time ti+d where some time increment is in the
future. In a long memory process the auto-correlation decays over time fol-
lowing a power law namely,
p(k) = Ck −α .
where C is a constant and p(k) is the auto-correlation function with lag k. For
a given X1 , X2 , . . . . . . , Xn at time t1 , t2 , . . . . . . , tn the k-lag auto-correlation
function is defined as
n−k
X X n
rk = (Xi − X̄)(Xi+k − X̄)/ (Xi − X̄)2 ,
i=1 i=1
X1 + X2 + · · · + Xn
where X̄ = .
n
It may be remarked that in the above definition, the observations are uni-
formly sampled. Unlike cross correlation, the auto-correlation results in a cor-
relation coefficient indicating the degree of similitude between two values of
same variable at times ti and ti+k .
The auto-correlation function is used to detect non-randomness in data
and to identify an appropriate time series model if the data are not random.
When auto-correlation is used to detect non-randomness, it is usually only the
1-lag auto-correlation that is of interest. When the auto-correlation is used to
identify appropriate time series model, the k -lag auto-correlation is plotted.
Relationship of Autocorrelation and Hurst Exponent
Exponent of α is related to the Hurst exponent by the equation
H = 1 − α/2. .
The value of Hurst exponent lies between 0 and 1. A value of 0.5 indicates
a true random walk (a Brownian motion time series). In a random walk,
there is no correlation between any element and future elements. A Hurst
exponent value 0.5 < H < 1 indicates persistent behavior (for example a
positive auto-correlation). If there is increase of time step there will probably
be an increase from ti to ti+1 . The same is true of decreases, where a decrease

will tend to follow a decrease. A Hurst exponent value 0 < H < 0.5 will exist
for a time series with anti-persistent behavior (or negative auto-correlation).
Here an increase will tend to be followed by a decrease or a decrease will be
followed by an increase. The larger the Hurst exponent, the smaller is the
fractal dimension and the smoother is the surface.
Wavelet Method to Estimate Hurst Exponent
We describe here a method for estimating the Hurst exponent of a time series
(signal and function) using the wavelet method.
The total energy contained in a signal, f (t), is defined as
Z ∞
2 2
E= |f (t)| dt = kf k .
−∞
Two-dimensional wavelet energy density functions is defined as E = (a, b) =

2
|Tψ f (a, b)| . It signifies the relative contribution of the energy contained at
a specific scale a and location b. A plot of E(a, b) is known as a scalogram
(analogous to spectrogram). The scalogram can be integrated across a and b
to recover the total energy in the signal using admissibility constant Cg as
follows Z ∞
1
E(a) = |T (a, b)|2 db.
Cg −∞
Peaks in E(a) highlight the dominant energy scales within the signal. The
wavelet power spectrum Pw (f ) is defined as
Z τ
1 1 2
Pw (ξ) = EW (ξ) = |Tψ f (a, b)| db
τ τ ξ c Cg 0
where v
uR 2
u ∞ 2
u 0 ξ ψ̂(ξ) dξ

ξc = t R
u 2
∞
ψ̂(ξ) dξ

0
ξc
τ is length of a signal, and ξ = . The wavelet variance defined as
a
1 2
σ 2 (a) = |T (a, b)| db
τ
is often used in practice to determine dominant scale in the signal. We assume
that τ is of sufficient length to gain a reasonable estimate of σ 2 (a).
Brownian walks can be generated from a defined Hurst exponent. If the
Hurst exponent is 0.5 < H < 1, the random walk will be a long memory pro-
cess. Data sets like this are referred to as fractional Brownian motion (fbm).
The motion can be generated by a variety of methods including wavelet trans-
form. It may be recalled that in 1827 botanist R. Brown noticed that minute
particles suspended in liquid moved on highly irregular paths. There are sev-
eral phenomena of this nature such as smoke particles in air, fluctuations in
the stock market recorded at intervals of 5 minutes during market hours. Ein-
stein published a mathematical study of this motion, which eventually led
Perrin’s Nobel Prize winning calculation of Avogadro’s number [43].
1
It may be noted that spectral density is proportional to β where β = 2H + 1,
ξ
1
PW (ξf ) ∝
ξβ
ξa
where ξ = , and a is the scale parameter. For this reason, the fbm is
a
sometimes referred to as noise. There are several methods [43] to estimate the
Hurst exponent for real world systems. We present here one of those methods
which is based on wavelet methodology. The wavelet coefficient variance scale
index j denoted by σj2 is defined as follows.
2J−j
X−1
σj2 = (dj,k )2 /2J−j .
k=0
It is known that the wavelet power spectrum PW (ξ) is related to the wavelet
coefficient variance
J−j
2−1
2j ∆t X 2 ∆t 2
PW (ξj ) = dj,k ∆t = σ
τ ln 2 ln(2) j,k
k=0
j
as τ = 2 ∆t, where ∆t is sampling time (one may choose). That is,
2
PW (ξm ) ∝ σj,k .
Combining these relations and the fact that frequency ξ is inversely propor-
tional to the wavelet scale a = 2j , we obtain the scaling relationship
2 2H+1
σj,k ∝ aj
or simply,
2H+1
σj2 ∝ aj
or
1
H+ 2
σm ∝ am .
For an orthonormal multiresolution analysis using a dyadic grid, the scale a
is proportional to 2m . We can therefore take base 2 logarithms of both sides
2 2H+1
of σj,k ∝ aj to get equation
2
log2 (σm ) = (2H + 1)m + C
where the constant C depends both on the wavelet used and the Hurst ex-
ponent. By plotting this, the slope of regression line λ will give the Hurst
exponent.
λ − 1
H= .
2
This method has been applied to estimate the Hurst exponents for financial
and rainfall data [43] for different time intervals. A graphic user interface
(GUI) has been developed.
Wavelet Correlation Coefficient
As mentioned above wavelets can detect both the location and the scale of
a structure. Wavelets are parameterized both by a > 0 (dilation parameter)
and translation parameter b(−∞ < b< ∞) such
that
x−b
ψa,b = ψ .
a
The wavelet domain of one-dimensional function ψ is rather two-dimensional
in nature; one dimension corresponds to scale and other to translation. The
continuous wavelet transform for one dimension is defined as
Z ∞
∗ |x − b|
ω(a, b) = f (x)ψ
−∞ b
where a is the scale, f (x) is a one-dimensional function and ψ ∗ (* is a complex
conjecture) is an analyzing wavelet, also known as mother wavelet. We can
choose a Mexican hat wavelet as a possible analyzing wavelet:
! !
2 2
(|x − b|) 1 |x − b| |x − b|
ψ = 2− exp − .
a (2π)0.5 a a2 a2
Wavelet spectrum denoted by M (a)Z is defined as
1 ∞
M (a) = w(a, b)2 db.
a −∞
The wavelet spectrum has power law behavior
M (a) ≈ aλ .
Wavelet spectrum M (a) defines the energy of wavelet coefficient for scale a.
Here λ is exponent, the value of which is decided by the power of a.
Wavelet Cross Correlation Coefficient is defined as
w1 (a, b)w2∗ (a, b)

R
rw = db.
(M1 (a)M2 (a))1/2
The relation between correlation coefficient and wavelet correlation coefficient
can be written as
1/2
γω (a) (M1 (a)M2 (a)) a−1/2 da
R
r= R R db.
(M1 (a)a−1 da M2 (a)−1 da)1/2
If r is negative then the variations at two stations are not correlated and
for positive value the variations are correlated. In order to check the features
of temperature variations at two metrological stations one can plot wavelet
spectrum of temperatures at these stations and compare the features of the
correlation coefficients. In order to check the correlation between any two
parameters at two different stations we compute r with the aid of equation or
r given by equations.
11.1.2 Fractal Image Processing

Image compression techniques based on fractals have been developed in
the last decade and promise better compression performance. These tech-
niques are based on the recognition that fractals can describe natural scenes
better than shapes of traditional geometry. There are three main fractal image
compression techniques:
(i) fractal image compression based on the iterated function systems (IFS), (ii)
segment-based coding, and (iii) yardstick coding. Here, we confine ourselves
to the first technique where images are compressed into compact IFS codes
at the encoding stage, and fractal images are generated to approximate the
original image at the decoding stage.
For this technique, we refer to Neunzert and Siddiqi [35] and references
therein. The word fractal was coined by Benoit Mandelbrot from the Latin
word fractus, meaning broken, for describing objects that were too irregular
to fit into the traditional geometrical setting. There are several definitions of
fractals but we shall treat fractals as fixed points of mappings on the metric
spaces of all compact subsets of a complete metric space into itself. Cantor
set, Sierpinski gasket, Sierpinski carpet and Von Koch curves are examples of
fractals. An image can be approximated by these objects and some numbers
or quantities, characteristic of these objects will be transmitted or stored
and the original image can be retrieved from these characteristic numbers or
quantities.
For example, the fern image can be represented by 24 integers which can
be represented by 192 bits (if each integer is represented by 8 bits). On the
other hand, if we store fern pixels, we need 307,200 bits (supposing that the
image size is 480 lines by 640 pixels). Therefore, we achieved a compression
ratio of 1600 to 1 by the IFS technique of fractal image compression. Here,
we briefly introduce the fractal objects, IFS theorem and collage theorem.
Methods for finding IFS for images can be found in [35]. Commercial software
to implement the process is available.
Cantor Set
The Cantor set C is a subset of the metric space X = [0, 1], which is obtained
by successive deletions of middle third open subintervals, as follows:
I0 = [0, 1]
I1 = [0, 13 ] [ 23 , 33 ]
S
I2 = [0, 19 ] [ 29 , 39 ] [ 69 , 79 ] [ 89 , 99 ]
S S S
1
S 2 3 S 6 7 S 8 9 S 18 19 S 20 21 S 24 25 S 26 27
I3 = [0, 27 ] [ 27 , 27 ] [ 27 , 27 ] [ 27 , 27 ] [ 27 , 27 ] [ 27 , 27 ] [ 27 , 27 ] [ 27 , 27 ].
I 4 = I3 minus the middle open third of each interval in I3 .
...
...
...
I N = I N −1 minus the middle open third of each interval in I N −1 .
The Cantor set C is defined as
∞
\
C= In ,
n=0
where I0 ⊇ I1 ⊇ I2 ⊇ I3 ⊇ IN −1 ⊇ IN ⊇ IN +1 .
C 6= ∅ as O ∈ C. C is a perfect set.
Eo A E, C
~a" F
FIGURE 11.1: Sierpinski Gasket
In Figure 11.1, X1 , Y1 , Z1 are middle points of the sides AB, BC and AC,
respectively. Remove the triangle with vertices X1 , Y1 and Z1 .
X100 , X1000 , Y 0 are middle points of the triangle AX1 Z1 and so on. Next we
remove the central triangles of three left-over triangles of E1 .
Continue this process. Whatever is left is called the Sierpinski gasket or
Sierpinski triangle (Figure 11.1). For an interesting account of the Sierpinski
gasket, we refer to Stewart [45].
Iterated Function System
Theorem 79. (Banach contraction fixed point theorem) Let X be a com-

plete metric space and let T : X → X be a contraction mapping, that is,
d(T (x), T (y)) ≤ αd(x, y), 0 ≤ α < 1. Then T has a unique fixed point, that
is, ∃ unique u ∈ X such that T u = u. Furthermore, T 0n (y) → u as n → ∞
where T 0n (.) is defined as follows:
T 00 (x) = x
T 01 (x) = T (x), T 02 (x) = T (T (x)) = T (T 01 (x))
T 03 (x) = T (T 02 (x), . . . , T 0n (x) = T (T 0(n−1) (x))).
E,
E,
E,
E,
...•
F
FIGURE 11.2: Von Koch Curve
This theorem is a key to the fractal image compression.

Definition 108. Let (X, d) be a complete metric space. X together with a
finite set of contraction mappings W n ,n = 1,2. . . , N with contractivity factors
sn , n = 1, 2, . . ., N is called an “iterated function system” abbreviated as IFS,
where
s = max sn .
n
An IFS will be denoted by {X, Wn , n = 1, 2, . . . , N }, s = maxn Sn /.

Example 318. R, W (x) = 0, W (x) = 23 x + 13 is an IFS.

Let (X, d) be a complete metric space and H(X) = {K ⊂ X/K compact}.

For B ∈ H(X) and x ∈ X, d(x, B) = min{d(x, y)/y ∈ B},
d(x, B) = min{d(x, y)/y ∈ B},

d (A, B) = max {d(x, B)/x ∈ A}
d (A, B) 6= d (B, A)
h (A, B) = max {d (A, B) , d (B, A)} ,
has a unique fixed point, A ∈ H(X) satisfying
N
[
A = W (A) = Wn (A),
n=1
and is given by
A = lim W 0n (B), for any B ∈ H(X).

n→∞
Definition 109. The fixed point A ∈ H(X) described in the IFS theorem
is called the attractor of the IFS. The attractor is also called deterministic
fractal or fractal.
Let W : R2 → R2 be a contraction map, then A and W (A) are shown in
Figure 11.3
14'(A1
FIGURE 11.3: Contraction map
Relation between mappings on X and H( X)

Lemma 2. Let W1 be a continuous mapping on the metric space (X, d) into
itself. Then W1 maps H(K) into itself.
Lemma 3. Let V : X → X be a contraction mapping on the metric
space (X, d) with contractivity factors. Then {W : H(X) → H(X) de-
fined by W (B) = {V (x)/x} ∈ B, ∀B ∈ H(X)} is a contraction mapping
on (H(X), h(.)) with the same contractivity factor.
Lemma 4. Let (X, d) be a metric space and {W n , n = 1, 2, 3, . . . , N } be an

IFS. Let the contractivity factor for Wn be sn Define W : H(X) → H(X) by
[ [ [ N
[
W (B) = W1 (B) W2 (B) .... WN (B) = Wn (B)
n=1
for each B ∈ H(X). Then W is a contraction mapping with contractivity
factor s = max{sn : n = 1, 2, . . ., N }.
Example 319. (a) X =[0,1], W 1 (x) = 13 , W2 (x) = 13 x + 23 . W1 and W2 are

contraction maps on X into itself. The deterministic fractal or fractal of the
IFS system { X ,Wn ,n = 1,2} is the Cantor set.
(b) X = [O, l]X[0, 1]

x
Wi (x) = Ti + bi = 1, 2, 3,
y

1/2 0 0
W1 , b1 =
0 1/2 0

1/2 0 1/4
W2 , b2 =
0 1/2 1/4

1/2 0 1/2
W3 , b3 = .
0 1/2 0
The fixed point A of the map W : H(X) → H(X) defined W (B) =
SN
n=1 Wn (B) is the Sierpinski gasket.
(c) Sierpinski gasket and Von Koch curve are examples of fractals (determin-
istic fractals).
Theorem 80. (Collage theorem) Let B be an arbitrary target image (an el-
ement of H(X), X = R2 ) and {X, Wn = 1, 2, . . . , N } be an iterated function
system with contractivity factor s, 0 < s < 1. Further, let W (B) be as in
Lemma 4 such that the Hausdorff distance between B and W (B) is less than
ξ. Then the Hausdorff distance between B and the attractor A of the given
IFS system is less than (1 − s)−1 ξ.
Remark 81. (a) The iterated function theorem tells us that each IFS defines
a unique fractal image (attractor) and that the attractor of the IFS is exactly
the same as the union of the transformations of the attractor.
(b) Small (affine) deformed copies of the target are arranged so that they cover
the target as exactly as possible. This collage of deformed copies determines
an IFS. Theorem tells us that the better the collage, as measured by the
Hausdorff distance, the closer will be the attractor of the IFS to the target.
(c) An interesting consequence of the collage theorem is that if the matrix
entries in two codes are close, attractors of the codes are also close. In other
words, small errors in codes lead to small errors in images.
11.1.3 Differential Equations on Fractals

A fine account of partial differential equations on fractals is given in a dis-
sertation from Uppsala University [38] surveying fundamental research work
to date. The author refers exciting work by Strichartz [46] through [49], Mosco
[32] and Kigami [17] and [18]. In 1989, Kigami gave analytic construction of a
Laplacian on the Sierpinski gasket. Around 2003, Strichartz showed first order
linear differential equations based on the Laplacian that are not solvable on
the Sierpinski gasket. Pelander gave a characterization on the polynomial p
so that the differential equation p(∆)u = f is solvable on any open subset
of the Sierpinski gasket for any continuous function f on the subset. Mosco
[32] studied energy functionals of certain fractal structures. These authors
present an interesting exposition on the connections to the work of physicists.
Pelander [38] indicated in his thesis how these studies are relevant to physics
and engineering.
Understanding the complexities in a non-linear dynamical system is of
great interest in contemporary sciences. Both chaos and fractals to offer im-
portant ingredients toward this effort. Even a simple quadratic nonlinearity
as represented by the logistic map is known to generate, for a sufficiently large
control parameter, a fractal attractor, indicating the onset of the deterministic
chaos in the model.
Biologists studying the variability in populations of various species found
an equation that predicted animal populations reasonably well. This was a
simple quadratic equation called the logistic difference equation.
Example 320. The logistic difference equation is given by
xn+1 = rxn (1 − xn )
where r is the so-called driving parameter. The equation is used by starting

with a fixed value of the driving parameter, r, and an initial value of x0 . One
then runs the equation recursively, obtaining x1 , x2 , . . . , xn . For low values
of r, xn (as n goes to infinity) eventually converges to a single number. In
biology, this number (xn as n approaches infinity) represents the population
of a species. When the driving parameter r is slowly turned up, interesting
things happen. When r = 3.0, xn no longer converges, it oscillates between two
values. This characteristic change in behavior is called a bifurcation. Turn up
the driving parameter even further and xn oscillates between not two, but four
values. As one continues to increase the driving parameter, xn goes through
bifurcations of period eight, then sixteen, then chaos! When the value of the
driving parameter r equals 3.57, xn neither converges nor oscillates; its value
becomes completely random. For values of r larger than 3.57, the behavior is
largely chaotic. However, there is a particular value of r where the sequence
again oscillates with a period of three.
The bifurcation diagram of the logistic difference equation is shown in Figure
11.4.
Steady state values ol Pn
P..," rJ:>. (1- P.)

STart109 value p • 02
Value ol r --+
FIGURE 11.4: Bifurcation
11.1.4 Chaos and Fractals

Relation between Chaos and Fractals
Chaotic systems and fractals are both iterated functions. Each iteration takes
the state of the previous iteration as an input to produce the next state.
Some chaotic systems like Figure 11.4 show fractal repetition when you
zoom in. Asymmetric fractals on the other hand, can get chaotic and their
fractal nature can be hard to spot.
Although fractals can show beautiful patterns, you can’t actually predict
their value in a specific iteration other than by calculating all the preceding
iterations. So in a sense, fractals are just beautiful chaos.
11.2 Introduction to Time Series

Many statistical methods relate to data which are independent, or at least
uncorrelated. There are many practical situations where data might be corre-
lated. This is particularly so where repeated observations on a given system
are made sequentially in time.
Definition 110. A time series is a sequence of numerical data points in
successive order.
In investing, a time series tracks the movements of the chosen data points,
such as a security’s price, over a specified period of time with data points
recorded at regular intervals. There is no minimum or maximum amount of
time that must be included, allowing the data to be gathered in a way that
provides the information being sought by the investor or analyst examining

the activity. Data gathered sequentially in time are time series.
Definition 111. A time series is said to be stationary if it exhibits no sys-

tematic trend, no systematic changes in variance, and if it shows no periodic
variations or seasonality.
Most processes in nature are non-stationary. Time series calculations have
many applications, for example:
• Descriptions of data in the forms of summary statistics and graphs

• Analysis and interpretation of data via models that demonstrate time
dependence of data
• Forecasting of future activity based on samples from aseries
• Controls made possible by adjusting and analyzing parameters to make

a series fit closer to a target
• Adjustment of estimated variances or errors in a linear model to form a
time series of correlated observations
11.2.1 Examples of Time Series

Time series are analyzed to demonstrate the underlying structures and
functions that produce the observations. Understanding the mechanisms of a
time series allows a mathematical model to be developed that explains the data
in such a way that prediction, monitoring, or control can be achieved. Some
examples in which time series arise are economics and finance, environmental
modelling, meteorology and hydrology, demographics, medicine, engineering
and quality control. The simplest form of data is a long series of continuous
measurements at equally spaced time points. Observations are made at dis-
tinct points in time, and the observations may take values from a continuous
distribution.
Example 321. Analyzing time series airline passenger data using MATLAB.
First we create an array of monthly counts of airline passengers (measured in
thousands) from January 1949 through December 1960. See Figure 11.5.
% 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
y = [112 115 145 171 196 204 242 284 315 340 360 417 % Jan
118 126 150 180 196 188 233 277 301 318 342 391 % Feb
132 141 178 193 236 235 267 317 356 362 406 419 % Mar
129 135 163 181 235 227 269 313 348 348 396 461 % Apr
121 125 172 183 229 234 270 318 355 363 420 472 % May
135 149 178 218 243 264 315 374 422 435 472 535 % Jun
148 170 199 230 264 302 364 413 465 491 548 622 % Jul
148 170 199 242 272 293 347 405 467 505 559 606 % Aug
136 158 184 209 237 259 312 355 404 404 463 508 % Sep
119 133 162 191 211 229 274 306 347 359 407 461 % Oct
104 114 146 172 180 203 237 271 305 310 362 390 % Nov
118 140 166 194 201 229 278 306 336 337 405 432 ]; % Dec
When we create a time series object, we can keep the time information along
with the data values. We have monthly data, so we create an array of dates
and use it along with the Y data to create the time series object. The Matlab
Code is:
yr = repmat((1949:1960),12,1);
mo = repmat((1:12)’,1,12);
time = datestr(datenum(yr(:),mo(:),1));
ts = timeseries(y(:),time,‘name’,‘AirlinePassengers’);
ts.TimeInfo.Format = ‘dd-mmm-yyyy’;
tscol = tscollection(ts);
plot(ts)
..
FIGURE 11.5: Time Series Plot: Airline Passenger data 1949 to 1960
Example 322. We can use above example to examine trend and seasonality
(Figure 11.6). This series seems to have a strong seasonal component, with
a trend that may be linear or quadratic. Furthermore, the magnitude of the
seasonal variation increases as the general level increases. Perhaps a log trans-
formation would make the seasonal variation be more constant. First we’ll
change the axis scale.
h\_gca = gca;
h\_gca.YScale = ‘log’;
11 ....... ..,..
FIGURE 11.6: Time Series Plot: Airline Passenger Seasonality Trends
Time series can be also defined as follows.

Definition 112. A time series is a collection of data recorded over time
(weekly, monthly, quarterly), to generate a history that can be used by man-
agement to make current decisions and plans based on long-term forecasting.
It usually uses past patterns to project the future.
Components of Time Series
1. Secular trend: smooth long term direction of a time series (Figure 11.7).
2. Cyclical Variation: the rise and fall of a time series over periods longer
than one year (Figure 11.8)
3. Seasonal Variation: Patterns of change in a time series within a year
which tends to repeat each year (Figure 11.9).
4. Irregular variation: classified as episodic (unpredictable but identifi-
able)or residual (chance fluctuation and unidentifiable).
..
~ Hun Cop<!t k.i.L'(Mf:C"J.
4 8 t 0 E F G I J
J Ytlr ... _.,•• ,000¥!
2
3
"')
;004
"-0
67.l
Number of Anociatesat Home
Depot, Inc. {1993 tc 2007)
.....
. • ....-.-
4 1991 111.8
•oo r
-·
5 1990
»O r-·--
-
6 1)91 1~ .•
.. -
7 1!1911 156.7 - 3110
~
,a
...,
- ••
1099 20.L.. >SO
.II
""""
l
217J
10
11
2001
lOQ2
256.1
:>lll.
no
JOO •
-
~0
2J)I)) 2lll!••
·~
13 2001 323.1
0 f- ~ ~
lA 2DOS 34U 1992 lt94 ltU ItA f()OO 1002 NICI4 1006 tOOl
lS 21106 360.4 T.,.
16 2007 )Jl.l
FIGURE 11.7: Secular Trend
75
70
0
8
"C
0
65
60
\Long-term
(/)
secular trend
~Recovery1
55
(/)
-~
~ 50
co
45
40
1989 1994 1999 2004 2009
FIGURE 11.8: Cyclical Variation
20.0
v
'2
~
.E
!ft 10.0
"'
Q)
'iij
Cf)
0
at a2 a3 a. a, a2 a3 a. a, a2 a3 a.
2007 2008 2009
CHART 16-2 S"lcs of B"scball and Softball Equipment, Hcrchcr Sporting Goods,
2007-2009 by Qtmtcr
FIGURE 11.9: Seasonal Variation

11.2.2 Wavelets and Fractals in Time Series Analysis

Analysis of observed experimental data studied at different time points
is known as time series analysis. The technique allows us to study trends,
seasonal variations, business cycles, impacts of abrupt changes, impacts of
unwanted components, drift factors, self-similarities, and Compression. Time
series analysis is invaluable for forecasting future business parameters.
Economics and bankers use time series analysis to monitor stock market
activities, unemployment figures and foreign exchange rates. Social scientists
study population changes, birth rates, and educational statistics. In medicine,
the method is used to study epidemics and pandemics and clinical trial of
drugs. Electrocardiogram, magnetic resonance imagery, and other techniques
are used to diagnose patients. Time series analysis is also useful for studying
respiratory patterns, blood pressure fluctuations, and gastrointestinal prob-
lems. Geneticists use time series analysis in DNA. In the industrial arena,
time series is used to monitor machine performance and prevent wear and
breakage. Other uses include analysis of :
• Nuclear reactor performance

• Industrial equipment vibrations
• Global warming indicators
• Speech patterns
• El Niño impacts
• Wild life populations
• Earthquake prediction
• Explosion effects
• Physiological parameters that measure performance of cardiovascular,
respiratory and other body system:
– Electrocardiograms (ECG) to measure heart rate functioning
– Electroencephalograms (EEGs) to measure brain waves
– Electrooculograms (EOGs) to measure ocular performance
– Electromyelograms (EMGs) to study muscle functioning
– Ultrasound to measure various anatomic systems
Electroencephalogram (EEG)
Electroencephalogram is used to study brain activity. A seizure causes a sud-
den surge of electrical activity in the brain and may be an indicator of various
types of brain disorders. Time series analysis measurements help diagnose and
treat diseases of the nervous system such as epilepsy. More than 40 types of
epilepsy characterized by different energy distributions in the brain may be

distinguished by the time series analysis.
In recent years, wavelet methods have been used to diagnose and treat
epilepsy. This common disease affects about 2% of the world’s population,
mostly children. In some epilepsy syndromes, interictal paroxysmal discharges
of cerebral neurons indicate the severity of the disorder and may contribute
to other disturbances cerebral functions such as speech impairment and be-
havioral changes. Studies of energy spikes within the brain serve as important
diagnostic tools.
Electrooculography (EOG)
This technique measures the resting potential of the retina of the eye. The
eyeball functions as a dipole with an anterior positive pole at the cornea and
a posterior negative pole at the retina. Periorbital surface electrodes allow
recording of eye movements during sleep. The two eye movement patterns
identified in EOG signals are rapid eye movements (REMs) characterized by
sharp deflections and slow eye movements (SEMs), indicated by somewhat
sinusoidal pattern of low frequency (≈ 0.2 to 0.6 Hz).
Recognition of the occurrence of these eye movements is important for
clinicians for identifying and characterizing sleep disturbances. EOG provides
reliable methods for identifying REMs and SEMs and separating them from
other activities and artifacts, thus overcoming the problems related with man-
ual scoring. Several algorithms based on filtering techniques have been de-
veloped for automatic detection of REMs showing better performance than
conventional methods based on Fourier transform.
A new approach has been introduced for localizing SEM events based on
wavelet decomposition and multiresolution framework that involves comput-
ing energies at the different scales of decomposition.
Wavelet methods
Wavelet methods have emerged as a key time frequency analysis and coding
tool for EEG and other types of signals. Wavelet transforms and scalograms
act like microscopes by zooming into small structures to reveal time events
and into large structures to determine wave form trends. Energetic parame-
ters of signals are analyzed to extract details of specific events. The techniques
are based on the different frequenices of transient events that exhibit different
energy contents.
A paper by Magosso et al. [58] summarizes recent developments and cites
Daubechies technique and other types of wavelets such as Coeflets and Sym-
lets. Performance of various wavelets have been studied and indicate that
wavelet transforms out-perform short-time Fourier (Gabor) transform.
A wavelet is a quickly vanishing oscillation function localized as a continuous
or discrete signal decomposing into a scaled and translated version ψa,b (t) of
a single function ψ(t) called a mother wavelet:

1 t−b
ψa,b (t) = p ψ
|a| a
where a and b are the scale and translation parameters, respectively, with
a, b ∈ R and a 6= 0.
The continuous wavelet transform (CWT) of a signal f ∈ L2 (R) or space of
finite energy signals or the space of the square integrable functions is defined
as:
Z ∞ s
1 ? t−b
W (a, b) = Ca,b = f (t) ψ dt = hf, ψa,b i
−∞ |a| a
where h., .i denotes the inner product (dot product) and the symbol * means
complex conjugate. CWT provides a redundant representation of the signal
and requires a heavy burden of computation.
The discrete wavelet transform (DWT) is obtained by discretizing the
parameters a and b. Choose a = 2−j , b = k2−j with j, k ∈ Z. By substituting
this we get
ψj,k (t) = 2j/2 ψ(2j t − k).
The DWT can be written as
Z ∞
dj,k = f (t)2j/2 Ψ(2j t − k)dt = hf, ψj,k i
−∞
where dj,k are known as wavelet (or detailed) coefficients at scale j and loca-
tion k.
By appropriate selection of mother wavelet ψ, the family {ψ j,k (t)}j,k∈Z forms
an orthonormal basis for L2 (R). The original signal can be constructed from
DWT. Further more, the DWT may be interpreted in terms of a multiresolu-
tion analysis, where a hierarchy of approximation and details of the signal is
constructed in nested subspaces of L2 (R).
Given a signal f (t), its multiresolution decomposition at level h defined as ψ(t)
as mother wavelet, while it is a companion function, called a scaling function:
ϕj,k (t) = 2j/2 ϕ(2j t − k)

representing approximation coefficients (or scaling coefficient) at level h de-
fined as
ah,k = hf, φh,k i.

Extending decomposition over all resolution levels, the complete wavelet ex-
pansion can be obtained as
∞
X ∞
X
f (t) = dj,k 2j/2 ψ(2jt − k).
j=−∞ k=−∞
This equation expresses the synthesis of the original signal from wavelet coef-
ficients.
Let a signal be of finite length, say N and N = 2M . The approximated
and detailed signals at scale j will have only 2M −j samples each, because of
the downsampling operation. Coefficients at each scale j are placed at instant
tj,k = k2j (k = 0, 1, 2, ...2M −j ).
Hence, the range of scales that can be investigated is 1 ≤ j ≤ M , since the
decomposition can proceed only until the individual details contains a single
coefficient.
If the decomposition is carried out over all resolution levels M , the wavelet
expansion will be
M
M 2X
X −1
f (t) = dj,k 2j/2 ψ(2jt − k).
j=1 k=0
Note that k starts from 0 since we assume, without loss of generality, that the
signal starts from t = 0.
Scalograms
Scalograms are graphical representations of the squares of the wavelet coef-
ficients for the different scales. They are isometric views of sequences of the
wavelet coefficients versus wavelength. A scalogram clearly shows more details,
identifies the exact location and time and detects low frequency cyclicity of the
signal. The scalogram surface highlights the location (depth) and scale (wave-
length) of dominant energetic features within the signal. The combination of
the various vectors of coefficients at different scales (wavelengths) forms the
scalogram. The depths (location and time) with the largest (strongest) coeffi-
cients indicate the position where the particular wavelength change is taking
place. The scalogram provides a good space-frequency representation of the
signal.
Wavelet energy
The energy contained in a one-dimensional signal f (t) is
Z d
E(t) = |f (t)|2 dt = kf k2 .
c
The total energy contained in a two-dimensional signal f (x, y) is

Z d1 Z d2
|f (x, y)|2 dxdy = kf k2 .
c1 c2
If f is a discrete signal then

XX
E= |f (m, n)|2 = kf k.
n m
Wavelet transform decomposes a given signal into coefficients from which the
original function can be reconstructed. Total energy of the given signal and
its wavelet transform are identical.
A scalogram is a graphical representation of the square of the wavelet
coefficient versus wavelength. A scalogram shares more details of a signal and
detects low frequency cyclicities (Figure 11.10).
FIGURE 11.10: Scalogram
11.2.3 Prediction of Time Series Behavior Using Wavelets

and Fractals
Fractal dimension and Hurst exponent analysis of a geophysical time series
is popular, see [1], [7], [8], [21], [22] and [39] through [44]. However, these
analyses are mainly restricted to obtaining the fractal dimensions of various

time series. Only in the last couple of years has the linkage between these
dimensions to the dynamics of the time series of meteorological parameters
such as temperature, pressure and precipitation been studied.
Climate involves four major components, namely geographical parameters
(latitude, longitude, distance from sea and height above mean sea level), tem-
perature, pressure and precipitation. Since geographical parameters do not
change significantly they are treated as constants. The climatic dynamics of
any country or continent depend on the behavior of time series of temper-
ature, pressure and precipitation. Mandelbrot noted that time series for a
climatic variable correspond to a Brownian motion. Consider a discrete time
series given by x(ti ), i = 1, 2, 3, . . . , N , where t denotes the time and x the
amplitude of the variable under consideration. For a fractional Brownian mo-
tion, the amplitude increments x(tj ) − x(ti ) have a Gaussian distribution with
variance:
[x(tj ) − x(ti )]2 (tj − ti )2H

where the symbol h i denotes the average over many samples of x(t) and
H denotes the Hurst exponent taking values between 0 and 1. If H = 0.5,
we obtain the usual Brownian motion. The Hurst exponent is related to the
fractal dimension D of the time series curve by:
D = 2 − H.
If the fractal dimension D for the time series is 1.5, we again get the
usual Brownian motion. In this case there is no correlation between ampli-
tude changes corresponding two successive time intervals. Therefore, no trend
in amplitude can be discerned from the time series and the process is un-
predictable. However, as the fractal dimension decreases to 1.0, the process
becomes more predictable as it exhibits persistence. The future or the past
trend is more likely to follow an established trend [52]. As the fractal dimen-
sion increases from 1.5 to 2.0, the process exhibits anti-persistence; a decrease
in amplitude of the process is more likely to lead an increase in the future or
past trend. Hence the predictability again increases. The scale independent
unit thus gives us predictability of the acting process. References [39] through
[40], [41] through [44] and [53] deal with the prediction of meteorological be-
havior using fractal methods for India, Saudi Arabia and Mexico data.
Fractal dimension (D)
Fractal dimension is a numerical measure of the roughness of an object. We
are familiar with one dimension of a straight line, or two dimensions of a plane.
What about a fractal dimension between the two. A mathematical definition
of this concept is given in Section 11.1.1. A relationship between Hurst ex-
ponent (H) and fractal dimension (D) is given earlier. Computation of H is
described in Section 11.1.1. There are several software programs available for
the computation of fractal dimensions and Hurst exponents; see for example
MATLAB and Benoit software.
Wavelet analysis is a tool for analyzing localized variations in power by
decomposing a trace into time frequency space to determine both the dom-
inant modes of variability and how those modes vary in time. This method
is appropriate for analysis of non-stationary traces, that is, where the vari-
ance does not remain constant with increasing length of the data set. Fractal
properties are present where the wavelet power spectrum is a power law func-
tion of frequency. The wavelet method is based on the property that wavelet
transforms of self-affine traces have self-affine properties. Consider n wavelet
transforms each with a different scaling coefficient ai , where S1 , S2 , . . . , Sn
are the standard deviations from zero of the respective scaling coefficients ai .
Define the ratio of the standard deviations G1 , G2 , . . . , Gn−1 as:
G1 = S1 /S2 , G2 = S2 /S3 , . . . , Gn−1 = Sn−1 /Sn .
Estimate the average value of Gi as:

n−1
X
Gavg = Gi /n − 1.
i=1
The Hurst exponent (H) is H = f (Gavg ), where f is a heuristic function which

approximates the Hurst exponent by Gavg for stochastic self-affine traces.
11.2.4 Fractal Dimension and Predictability
D = 2 − H.
If the fractal dimension D for the time series is 1.5, there is no correlation
between amplitude changes corresponding to two successive time intervals.
Therefore, no trend in amplitude can be discerned from the time series and
the process is unpredictable. However, as fractal dimension decreases, to 1,
the process becomes more predictable as it exhibits persistence. Predictably
indices (denoted by P IT , P IP and P IR ) for temperature, pressure, and pre-
cipitation are defined as follows:
P IT = 2|DT − 1.5|; P IP = 2|DP − 1.5|; P IR = 2|DR − 1.5|.
Concepts of fractal and multifractal and their relevance to real world systems
were introduced by Benoit Mandelbrot, for updated references and interest-
ing introduction of the theme we refer to Mandelbrot and Hudson [27]. In
many real world systems represented by time series, the pattern of singulari-
ties shown by a graph of points changes abruptly and analysis is a challenging
task. The time series of rainfall data are usually fractal or multifractal.
Wavelet-based Hurst exponent and fractal dimensional analysis
Saudi Arabia climatic dynamics

The climate predictability indices were obtained for nine meteorological sta-
tions using wavelet-based Hurst exponent and fractal dimensions [42]. The me-
teorological data which included the ambient temperature, barometric pres-
sure, precipitation, relative humidity and wind speed covered the 16 years
between 1990 and 2005. The major observations of the study are summarized
as follows.
The temperature was found to be strongly predictable for all the stations
except Abha when evaluated using the entire data set. The process became
completely unpredictable during winter and summer time with PIT values
much less than 0.5. In summer season the situation was almost the same except
for Jeddah, Yanbu and Guryat where PIT was greater than 0.5. The pressure
data showed a strong persistence behavior at all the stations when evaluated
using entire data set and became anti-persistence at most of the stations
during winter season with the exception of Hail, Guryat, Turaif and Riyadh.
In summer season, the pressure predictability was found strong at most of
the locations except for Gizan and Yanbu where PIT was less than 0.5. The
relative humidity was strongly correlated with the previous values for all the
stations when evaluated with entire data set and varied between 0.58 and 0.91.
During winter the process was less predictive. The precipitation predictability
indices PIR were found to be independent of length of the data set, i.e.,
complete, winter or summer sets with a few exceptions. Furthermore, the
precipitation predictability indices were also found to be independent of the
temperature and pressure predictability indices. The wind speed was always
found unpredictable in Hail, Guryat and Turaif and predictable in Abha based
on the entire data set.
Moreover, the wind speed predictability was found to be independent of
the temperature and pressure predictability indices irrespective of seasonality.
It is recommended that predictability indices for other meteorological stations
be established and detailed monthly studies be conducted.
Software list
• BenoitT M software developed by B. Mandelbrot
• Last Wave, http://www.cmap.polytechnique.fr/ bacry/Lastwave devel-

oped by E-Bacry et al.
• WaveLab, http://www.stat.standford.edu/ wavelab/ developed by D.
Donoh et al.
• FracLab, http:// fractals.inria.fr, developed by J. Levy, Vehel et al.

11.3 Introduction to Neural Networks

Neural network simulations are recent developments. Currently, the neu-
ral network field enjoys a resurgence of interest and a corresponding increase
in funding. Neural networks, with their remarkable ability to derive meaning
from complicated or imprecise data, can be used to extract patterns and de-
tect trends that are too complex to be noticed by humans or basic computer
techniques. A trained artificial neural network (ANN) can be thought of as an
expert in the category of information it has been given to analyze. This expert
can then be used to provide projections given new situations of interest and
answer ”what if” questions. Other advantages are listed below:
Adaptive learning: An ability to learn how to do tasks based on the data
acquired by training or initial experience.
Self-organisation: An ANN can create its own organization or representa-
tion of the information it receives during learning time.
Real time operation: ANN computations may be carried out in parallel,
and special hardware devices take advantage of this capability.
Fault tolerance via redundant information coding: Partial destruction
of a network leads to the corresponding degradation of performance. However,
some network capabilities may be retained even with major damage.
Definition 113. A technical neural network consists of simple processing
units called neurons and directed, weighted connections between those neu-
rons. Here, the strength of a connection (or the connecting weight) between
two neurons i and j is referred to as wi, j.
Artificial Neural networks (ANNs) are used for effective and efficiently mod-
elling of large and complex problems and also in classification of problems
(where the output is a categorical variable) or regressions (where the output
variable is continuous). An ANN is a powerful non-linear modelling approach
based on the function of the human brain. It identifies and learns the cor-
related patterns between input data sets and target values. An ANN is as a
network of simple processing nodes or neurons interconnected in specific order
to perform simple numerical manipulations
Definition 114. A neural network is a sorted triple (N, V, w) with two
sets N, V and a function w, where N is the set of neurons and V a set
{(i, j)ji, j∈N } whose elements are called connections between neuron i and
neuron j.
The function w : V →R defines n. A network = neurons + weighted con-
nection, where w((i, j)), the weight of the connection between neuron i and
neuron j, is shortened to wi, j. Depending on wi, j is either undefined or 0 for
connections that do not exist in the network. The weights can be implemented
in a square weight matrix W or in a weight vector W with the row number of

the matrix indicating where the connection begins, and the column number
of the matrix indicating which neuron is the target. Indeed, in this case the 0
marks a non-existing connection. This matrix representation is also called a
Hinton diagram.
These networks are consist of an input layer consisting of nodes representing
different input variables, a hidden layer consisting of many hidden nodes and
an output layer consisting of output variables. An ANN has four common
assumptions:
1. Information processing occurs at many simple elements called neurons.
2. Signals are passed between neurons over connection links.

3. Each connection link has an associated weight that in a typical neural
net multiplies the signal transmitted.
4. Each neuron applies an activation function (usually non-linear) to its
net input (sum of weighed input signals) to determine its output signal.
Definition 115. An artificial neural network can be defined as an information

processing system consisting of many processing elements joined in a structure
inspired by the cerebral cortex of the brain.
The processing elements are usually organized in a sequence of layers with
full connections between layers. Typically, there are three (or more) layers:
an input layer where data are presented to the network through an input
buffer, an output layer with a buffer that holds the output response to a given
input, and one or more intermediate or hidden layers. The operation of an
artificial neural network involves processes of learning and recall. Learning is
the process of updating the connection weights in response to external stimuli
presented at the input buffer. The network “learns” in accordance with a
rule governing the adjustment of connection weights in response to learning
examples applied at the input and output buffers. Recall is the process of
accepting an input and producing a response determined by the geometry
and synaptic weights of the network.
Processing units
Each processing element (or neuron) receives inputs (signals) from neighbors
or external sources and uses them to compute an output signal which is prop-
agated to other units. Along with this processing, the weights are adjusted.
The system is inherently parallel in the sense that many units can carry out
their computations at the same time. During operation, units can be updated
synchronously or asynchronously. For synchronous updating, all units update
their activation simultaneously and for asynchronous updating, each unit has
a (usually fixed) probability of updating its activation at a time t and usually
only one unit will be able to do this at a time.
W•ights ProrE>ssiug E IE'ments
}
___ .,.
I nput
Vertor ___ .,. Output
Vector
__ _.,.
'-----v----' '-----v----' '-----v----'

I nput Hidden Output
Layer Layer Layer
FIGURE 11.11: Neural Network Layer Diagram
Artificial neural networks

Activation and output functions
Signals are processed by neurons using two functions. The activation function
determines the received total signal neurons. In most cases, a linear combina-
tion of the incoming signals is used. For neuron i connected to neurons j (for
j = 1, . . . , N ) sending signals xj with the strength of the connections Wij , the
total activation signal Ii is
N
X
Ii (x) = wij (t)xj .
n=1
The output function determines the neuron’s signal processing and is denoted
by o(I). Together these two functions determine the values of the outgoing
neuron signals. The total function acts in the input space of N -dimension
called the parameter space. The composition of these two functions is called
the transfer function o(I(x)). The activation and the output functions of the
input and the output layers (Figure 11.11) may be of different types than those
of the hidden layer. In general, linear functions are used for inputs, outputs
and non-linear functions for hidden layers.
Single neuron
The basic unit of computation in a neural network is the neuron (Figure 11.12)
often called a node or unit. It receives input from other nodes or from an
external source and computes an output. Each input has an associated weight
(w), which is assigned on the basis of its relative importance to other inputs.
The node applies a function f (defined below) to the weighted sum of its
inputs as shown in the figure:
The network takes numerical inputs X1 and X2 and has weights w1 and w2 as-
sociated with those inputs. Additionally, another input 1 with weight b (called
the bias) is associated with it. The output Y from the neuron is computed as
(Input 1) Xl
(Input 2)
Output of neuron = Y= j(wl. Xl + w2.X2 + b)
FIGURE 11.12: Single Neuron
shown in the figure. The function f is non-linear and is called the activation
function (Figure 11.13). The purpose of the activation function is to introduce
non-linearity into the output of a neuron. This is important because most real
world data is non-linear and we want neurons to learn non-linear represen-
tations. Every activation function (or non-linearity) takes a single number
and performs a certain fixed mathematical operation on it. There are several
activation functions you may encounter in practice:
1. The sigmoid function takes a real valued input to a range between 0 and
1:
σ(x) = 1/(1 + exp(−x))
.
2. The tanh function squashes a real-valued input and squashes it to the
range [−1, 1]:
tanh(x) = 2σ(2x)−1
.
3. The rectified linear unit (ReLU) threshold a real valued input by replac-
ing negative values with 0:
f (x) = max(0, x).
Figure 11.13 shows the three activation functions above.
Network topologies
Feed forward network: The data flow from input to output units is strictly
feed forward. The data processing can extend over multiple (layers of) units,
Sigmoid tanh ReLU
FIGURE 11.13: Different Activation Functions
but no feedback connections extending from outputs of units to inputs of units

in the same layer or previous layers are present.
Recurrent networks can contain feedback connections. The dynamical
properties of the network are important. In some applications, the activa-
tion values of the units undergo a relaxation process such that the network
will evolve to a stable state in which these activations do not change. In other
applications the changes of the activation values of the output neurons are
significant with the dynamical behavior that constitutes the output of the
network.
An example of a feedforward neural network is shown in Figure 11.14. A
lnputl.¥r Hidden l.lyer OUtput lAyer
Output I
Outpu12
FIGURE 11.14: Example of Feedforward Neural Network
network can consist of three types of nodes:

Input nodes provide information from the outside world to the network and
are referred to as the input layer. No computation is performed in input nodes;
they just pass on the information to the hidden nodes.
Hidden nodes have no direct connection with the outside world (hence the
name “hidden”). They perform computations and transfer information from
the input nodes to the output nodes. A collection forms a hidden layer. While
a feedforward network will only have a single input layer and a single output
layer, it can have zero or multiple Hidden Layers.
Output nodes are collectively referred to as the output layer and are respon-
sible for computations and transferring information from the network to the
outside world.
In a feedforward network, the information moves in only one direction: forward
from the input nodes, through the hidden nodes (if any) to the output nodes.
There are no cycles or loops in the network. This property of feedforward
networks is different from recurrent neural networks in which the connections
between the nodes form a cycle.
Feedforward Neural Networks
The Single Layer Perceptron is the simplest type and has no hidden layer
and can learn only linear functions. A multilayer perceptron (MLP) contains
one or more hidden layers along with an input layer and an output layer. It
can handle linear and non-linear functions. Figure 11.15 shows a multilayer
perceptron with a single hidden layer. All connections of an MLP have asso-
ciated weights. The figure shows only three weights (w0 , w1 , w2 ).
Layer Description
The input layer has three nodes. The bias node has a value of 1. The other
two nodes have external inputs values of X1 and X2 and depend on the input
data set. No computations are performed in the input layer. The 1, X1 and
X2 outputs are fed into the hidden layer.
The hidden layer also has three nodes and the bias node has an output of 1.
The outputs of the other nodes depend on the 1, X1 , X2 outputs and weights
associated with the connections (edges). The output connection from one of
the hidden nodes is highlighted in Figure 11.15. The output from other hid-
den node can be calculated. Remember that f denotes activation function.
Outputs from the other hidden layer are fed to the output layer.
The output layer has two nodes that receive inputs from the hidden layer and
perform computations similar to those shown for the highlighted hidden node
in the figure. The Y1 and Y2 values calculated act as outputs of a multilayer
perceptron.
Given a set of features X = (x1 , x2 , . . .) and a target y, a Multi Layer
Perceptron can learn the relationship between the features and the target,for
either classification or regression.
Example 323. (XOR Problem) Some problems cannot be solved by any
perceptron. In fact there are more such problems than problems which can be
solved using perceptrons. The most often quoted example is the XOR problem
of building a perceptron which takes two Boolean inputs and outputs the XOR
of them. What we want is a perceptron which will output one if the two inputs
are different and 0 otherwise.
Input Layer Hidden Layer Output Layer
Yl
Y2
Output from the = j(summation) = j(wO . 1 + wl. Xl + w2. X2)

highlighted neuron
FIGURE 11.15: Multilayer Perceptron with One Hidden Layer
Input Desired Output

00 0
01 1
10 1
11 0
Consider the following perceptron as an attempt to solve the problem.

If the inputs are both 0, the net input is 0 which is less than the threshold.
y
1
FIGURE 11.16
So the output is the desired 0 output. If one of the inputs is 0 and the other is
1, then the net input is 1. This is above the threshold, and so the output 1 is
obtained. But the given perceptron fails in that case. To see that no perceptron
can be built to solve the problem, try to build one. Now we need to cover an
important point regarding the input data and the desired output. We use the
binary OR operator as an example to explain the function of the weights and
threshold. With OR we want a binary output so a single perceptron with two
inputs is created. Now the search space for the neural network can be drawn
as shown in Figure 11.17.
Input 1
''
''
•
''
''
' ' 1;.,
"$
':0~
~ ..
''
''
''
0 '' lnput 2
''
FIGURE 11.17: OR Operator Modelled by Single Perceptron
The dark dots represent values of true and the light dot represents a value of
false; you can clearly see how the two classes are separable. We can draw a
line separating them as in the above example. This separating line is called
a hyperplane. A single neuron can create a single hyperplane and the above
function can be solved by a single neuron. Another important point is that the
hyperplane above is a straight line and that means we used a linear activa-
tion function (i.e. a step function) for our neuron. If we used a sigmoid or
similar function the hyperplane would resemble a sigmoid shape as seen in
Figure 11.18. The hyperplane generated by the image depends on the activa-
tion function used.
Remember the threshold (bias) value cited earlier? What does that
do? Simply put, it shifts the hyperplane left and right while the weights ori-
entate the hyperplane. In graphical terms the threshold translates the hyper-
plane while the weights rotate it. This threshold also needs to be updated
during the learning process. The basic procedure is as follows:
1. Run input pattern through function.

2. Calculate error (desired versus actual value).
'"•"', sigmoid hyperplane - excuse my poor pair1 skills :P
/,... ---
/
I
I
J
I
I
/
-/
-------
lnput 2
FIGURE 11.18: Hyperplane OR Operator Modelled by Single Perceptron
3. Update weights according to learning rate and error.

4. Move onto next pattern.
The learning rate term is very important because it greatly affects the per-
formance and accuracy of a network. For more details we refer to [20], [23] ad
[63].
11.4 Introduction to Fuzzy and Neuro-fuzzy

Definition 116. (Fuzzy sets) A classical crisp set is a collection of distinct
objects. Crisp set theory was founded by the German mathematician George
Cantor (1845-1918). It is defined as dividing the elements of a given universe
into two groups: members and non-members. It is defined by a characteristic
function. Let U be a universe of discourse. The characteristic function µ(x) of
a crisp set A in U is defined as:

1 iff x ∈ A
µA (x) = .
0 iff x ∈
/A
Zadeh introduced fuzzy sets with flexible membership. In fuzzy sets, many
degrees of membership are allowed. The degree of membership to a set is indi-
cated by a number between 0 and 1. Fuzzy sets can be considered extensions
and generalizations of the crisp sets.
A fuzzy set A in the universe of discourse U can be defined as a set of

ordered pairs,
A = {(x, µA (x))|x ∈ U }
where µA is called the membership function of A and µA (x) is the degree of
membership of x in A which indicates the degree that x belongs to A. The
membership function maps U to the membership space M that is µA : U →
M.
When M = {0, 1}, set A is non-fuzzy and µA is the characteristic function
of the crisp set A. For a fuzzy set the range of the membership function is
a subset of the non-negative real numbers. In general, M is the set of unit
interval [0, 1].
Definition 117. Membership functions (MFs) can be represented by follow-
ing functions:
Triangular: A triangular MF is a function with three parameters defined by

x−a c−x
triangle(x, a, b, c) = max min , ,0 .
b−a c−b
Trapezoidal: A trapezoidal MF is a function with four parameters defined
by
~· (x) p (x)
1.0
0
20 50 80 50
(a) ( b)
FIGURE 11.19: Membership Functions: (a) Triangle (b) Trapezoid
Gaussian: A Gaussian MF is a function with two parameters defined by

x−c 2
gaussian(x, σ, c) = e−( σ ) ,
where c is the center and the width of membership function.
Bell: A bell MF (Figure 11.20) is a function with two parameters defined by:
1
bell(x, a, b, c) = x−c 2b .
1+
a

Sigmoidal: A sigmoidal MF is a function with two parameters defined by

1
sigmmoid(x, k, c) = ,
1 + e−k(s−c)
MF
c-a c c +a X
2a
FIGURE 11.20: Bell Membership Function
where parameter k influences sharpness of function in the point where a = c.

If k > 0 the function is open on the right site, if k < 0 the function is open
on the left site and thus the function can be used for describing the concept
like “very big” or “very small”. Sigmoidal function is used in neural networks
as on activation function.
Fuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoning are the main frames of fuzzy inference systems
used as modelling tools based on fuzzy set theory. They are used in real-world
problems such as expert systems, pattern recognition and data classification.
Fuzzy reasoning, also known as approximate reasoning, is an inference pro-
cedure that derives conclusions from a set of fuzzy if-then rules and known
facts.
Fuzzy if-then rules
Fuzzy if-then rules (also known as fuzzy conditional statements) are expres-
sions of the form
If x is A, then y is B
where A and B are linguistic labels defined by fuzzy sets on universes of
discourse X and Y , respectively. Often “x is A” is called the antecedent or
premise while “y is B” is called the consequence or conclusion. Fuzzy if-then
rules are used to capture the imprecise modes of reasoning and play an essen-
tial role in the human ability to make decisions in an environment of uncer-
tainty and imprecision. Fuzzy if-then rules have been used extensively in both
modelling and control. Also, due to the qualifiers on the premise parts each
fuzzy if-then rule can be viewed as a local description of the system under
consideration.
Fuzzy inference systems
The fuzzy inference system (Figure 11.21) is a popular computing framework
based on the concepts of fuzzy set theory, fuzzy if-then rules and fuzzy rea-
soning. It is used in automatic control, data classification, decision analysis,
expert systems, robotics and pattern recognition. The fuzzy inference system
is also known as the fuzzy expert system, fuzzy model, fuzzy-rule-based sys-
tem, fuzzy logic controller and fuzzy system. The basic structure of a fuzzy
inference system consists of five functional components:
1. Rule base which contains a selection of fuzzy rules.
2. Database which defines the membership functions used in the fuzzy
rules.
3. Reasoning mechanism which performs the inference procedure upon the
rules and given facts to derive a reasonable conclusion.
4. Fuzzification interface which transforms the crisp inputs into degrees of
match with linguistic values.
5. Defuzzification interface which transforms the fuzzy results of the infer-
ence into a crisp output.
FIGURE 11.21: Fuzzy Inference System
The steps of fuzzy reasoning (inference operations upon fuzzy if-then rules)
performed by fuzzy inference systems are:
1. Compare the input variables with the membership functions on the an-
tecedent part to obtain the membership values of each linguistic label
(fuzzification step).
2. Combine (through a specific T-norm operator, usually multiplication or
min) the membership values on the premise part to get firing strength
(weight) of each rule.
3. Generate the qualified consequents (either fuzzy or crisp) of each rule
depending on the firing strength.
4. Aggregate the qualified consequents to produce a crisp output (defuzzi-
fication step).
Mamdani fuzzy model

The Mamdani fuzzy inference system was proposed as the first attempt to
control a steam engine and boiler combination by a set of linguistic control
rules obtained from experienced human operators. An example of if-then rules
is if the pressure is high, the volume is low, where pressure and volume are
linguistic variables and high and small are linguistic values or labels charac-
terized by membership functions.
Tsukamoto fuzzy model
In the Tsukamoto fuzzy models the consequent part of each fuzzy if-then rule
is specified by a membership function of a step function centered at a constant.
As a result the inferred output of each rule is defined as a crisp value induced
by the rule’s firing strength. The overall output is taken as the weighted av-
erage of each rule’s output. Fuzzy models avoid the time consumed by the
defuzzification process since it aggregates each rule’s output by the method
of weighted average. This fuzzy model has the limitation that it is not trans-
parent.
Sugeno fuzzy model
The Sugeno fuzzy model (also known as the TSK fuzzy model) was proposed
by Takagi, Sugeno and Kang in an effort to develop a systematic approach to
generating fuzzy rules from an input-output data set. The Sugeno fuzzy model
is implemented into the neural fuzzy system ANFIS. A typical fuzzy rule has
the format if x is A, then y is B, then Z = f(x, y) where A and B are fuzzy
sets in the antecedent, z = f (x, y) is a crisp function in the consequent part,
f (x, y) is a polynomial in the input variables x and y that can appropriately
describe the output of the system within the fuzzy region specified by the
antecedent part of the rule.
If f (x, y) is a first-order polynomial, the first-order Sugeno fuzzy model
is obtained. If f is a constant the zero-order Sugeno fuzzy model can be
viewed either as a special case of the Mamdani fuzzy inference system where
each rule’s consequent part is specified by a fuzzy set or a special case of
Tsukamoto’s fuzzy model. A zero-order Sugeno fuzzy model is functionally
equivalent to a radial basis function network under certain minor constraints.
Using the Takagi and Sugeno’s fuzzy if-then rule the resistant force on a
moving object is defined:
If velocity is high, then force = k ∗ (velocity)2 where high in the premise
part is a linguistic label characterized by an appropriate membership function.
The consequent part is described by a non-fuzzy equation of the input variable
velocity.
Neuro-fuzzy inference system
The basic concepts and rationale involve integrating fuzzy logic and neural
networks into a working functional system. The combination of the techniques
of fuzzy logic and neural networks suggests the novel idea of transforming.
ANFIS is a universal estimator which may be able to approximate any real
continuous function on a compact set. The basic structure of this type of

fuzzy inference system is to map input characteristics to input membership
functions. The system then relates input membership function to rules and
the rules to a set of output characteristics. It then maps output characteristics
to output membership functions and output membership function to a single
output. Each fuzzy system contains a fuzzifier, fuzzy database and defuzzifier.
A fuzzy database includes fuzzy rule base and an inference engine.
ANFIS architecture
According to the Takagi and Sugeno types, the fuzzy inference system has two
inputs x and y and one output f as shown in Figure 11.22:
Rule 1: If x is A1 and y is B1 then f1 = p1 x + q1 y + r1 . Rule 2: If x is A2 and
y is B2 then f2 = p2 x + q2 y + r2 .
Layer 1: Every node i in this layer is a square node with a node function
Oi1 = µAi (x)
where x is the input to the node i and Ai is the linguistic label associated
with this node function. Oi1 is the membership function of Ai and it specifies
the degree to which the given x satisfies the quantifier Ai . Assume µAi (x) is
bell-shaped with a maximum equal to 1 and minimum equal to 0 such as the
generalization bell function
1
µAi (x) = 2 bi
x−ci
1+ ai
or Gaussian function x−c 2

µAi (x) = e−( σ )
where {ai , bi , ci } is the parameter set. As these parameters change, the bell
shaped functions vary accordingly. Any continuous and piecewise differentiable
functions such as commonly used trapezoidal or triangular-shaped member-
ship functions are also qualified candidates for node functions in this layer.
Parameters in this layer are referred as premise parameters.
Layer 2: Every node in this layer is a circle node which multiplies the incoming
signals and sends the product out. For instance
Oi2 = wi = µAi × µBi , i = 1, 2, 3 . . . .
Each node output represents the firing strength of a rule.

Layer 3: Every node in this layer is a circle node labeled N . The ith node
calculates the ratio of the ith rule’s firing strength to the sum of all rules
firing strengths:
wi
Oi3 = wi = .
w1 + w2
The outputs of this layer will be called normalized firing strength.

Layer 4: Every node i in this layer is a square node with a node function
Oi4 = wi fi = wi = wi (pi x + qi y + ri )
where is the output of layer 3 and (pi , qi , ni ) is the parameter set. Parameters
in this layer will be referred as consequent parameters.
Layer 5: The single node in this layer is a circle node that computes the overall
output as the summation of all incoming signals, i.e.,
P
X wi fi
Oi5 = wi fi = Pi .
i wi
Thus, it construct an adaptive network which is functionally equivalent to a

type 3 fuzzy inference system.
11 • p,x +q,y +r,

~ f: w1 f,+ w2 f2
~ w,+w,
ft!:!f¥+lbY+t; =w,t,+ Wzlz
(a)
Layer 1
Lsyer4
~ Layer 2 Lsyer3 ~
~ ~ LayerS
X ~
f
y
(b)
FIGURE 11.22: Neuro-Fuzzy

Example 324. In some applications, such as expert systems, it is necessary

to introduce formal methods capable of dealing with such expressions so that
a computer using rigid Boolean logic can still process them. This is what the
theory of fuzzy sets and fuzzy logic tries to accomplish.
Example 325. Consider the statement: if the food is delicious, then the tip is
high. The tip variable belongs to the high fuzzy set to a degree that depends on
the level of the validity of the premise, i.e., the membership degree of the food
variable to the delicious fuzzy set. The concept is that the more propositions
in a premise are checked, the more suggested output actions must be applied.
To determine the degree of truth of the tip is high fuzzy proposition, we must
define the fuzzy implication. The result of application of a fuzzy rule thus
depends on three factors:
1. The definition of fuzzy implication chosen

2. The definition of the membership function of the fuzzy set of the propo-
sition located at the conclusion of the fuzzy rule
3. The degree of validity of propositions located premise
Since we understand the fuzzy operators AND, OR and NOT, the premise
of a fuzzy rule may well be formed from a combination of fuzzy propositions.
All the rules of a fuzzy system constitute the decision matrix. Here is the
decision matrix for our tip example:
If the service is bad or the food is awful then the tip is low.
If the service is good then the tip is average.
If the service is excellent or the food is delicious then the tip is high.
11.5 Software for Time Series, Neural Network, Neuro-

fuzzy and Fuzzy
Software programs relevant to the three categories above are listed in the
following table:
Time Series Neural Networks

Neuro-Fuzzy and
Fuzzy Systems
CRAN Task View Brian Fuzzy Clustering
Eviews Emergent Learning Vector Quantization
MATHEMATICA Genesis Demonstration
MATLAB Simulink MATHEMATICA MATHEMATICA
R MATLAB Simulink MATLAB Simulink
SAS Neural Lab Multilayer Perceptron
SPSS Neuron Training Demonstration
Time Series Analysis Next Neuro-Fuzzy Control
R Neuro-Fuzzy Function
Stuttgart Neural Approximation
Network
Simulator (SNNS) R
11.6 Introduction to Graph Theory with Applications

Graphs
A linear graph (or simply a graph) G = (V, E) consists of a set of objects
V ={v1 , v2 , . . . } called vertices, and another set E = {e1 , e2 , . . . }, whose
elements are called edges, such that each edge ek is identified with an un-
ordered pair (vi , vj ) of vertices. The vertices (vi , vj ) associated with edge ek
are called the end vertices of ek .
Remark 82. The most common representation of a graph is by a diagram, in
which the vertices are represented as points and each edge as a line segment
joining its end vertices. The object shown in Figure 11.23 is a graph.
Observe that this definition permits an edge to be associated with a vertex
pair (vp , vt ).
Definition 118. An edge having the same vertex as both its end vertices is
called a self-loop (or simply a loop).
Example 326. Edge e1 in Figure 11.23 is a self-loop.

Note that the definition allows more than one edge associated with a pair of
vertices, for example, edges e4 and e5 in Figure 11.23. Such edges are referred
to as parallel edges.
Incidence and Degree

Definition 119. When a vertex vi is an end vertex of some edge ej , vi and
ej are said to be incident with (on or to) each other.
es
vs
es V4
FIGURE 11.23: Graph with Five Vertices and Seven Edges
Example 327. In Figure 11.23 edges e2 , e6 and e1 are incident with vertex
v4 .
Definition 120. Two nonparallel edges are said to be adjacent if they are
incident on a common vertex.
Example 328. e2 and e7 in Figure 11.23 are adjacent. Similarly, two vertices
are said to be adjacent if they are the end vertices of the same edge. In the
figure, v4 and v5 are adjacent, but v1 , and v4 are not. The number of edges
incident on a vertex vi , with self-loops counted twice, is called the degree,
d(vi ), of vertex vi .
Example 329. In Figure 11.23, d(v1 ) = d(v3 ) = d(v4 ) = 3, d(v2 ) = 4, and

d(v5 ) = 1. The degree of a vertex is sometimes also referred to as its valency.
Let us now consider a graph G with e edges and n vertices v1 , v2 , . . ., v n . Since
each edge contributes two degrees, the sum of the degrees of all vertices in G
is twice the number of edges in G. That is
n
X
d(vi ) = 2e.
i=1
Taking Figure 11.23 as an example once more, d(v1 ) + d(v2 ) + d(v3 ) +

d(v4 ) + d(v5 ) = 3 + 4 + 3 + 3 + 1 = 14 = twice the number of edges. We shall
derive the following interesting result.
Theorem 81. The number of vertices of odd degree in a graph is always even.
Definition 121. A graph that has neither self-loops nor parallel edges is
called a simple graph. In some graph theory literature, a graph is defined
to be simple, but in most engineering applications it is necessary that parallel
edges and self-loops be allowed; this is why our definition includes graphs with
self loops and/or parallel edges.
It should also be noted that in drawing a graph, whether the lines are drawn
straight or curved, long or short is immaterial. What is important is the
incidence between the edges and vertices. For example, the two graphs drawn
in figure below are the same, because incidence between edges and vertices is
the same in both cases.
3 3
(a) (b)
FIGURE 11.24: Same graph drawn differently
In a diagram of a graph, two edges may seem to intersect at a point that does
not represent a vertex, for example, edges e and f in Figure 11.24. Such edges
should be thought of as in different planes, thus having no common point.
(Some authors break one of the two edges at such a crossing to emphasize this
fact.)
Definition 122. A graph is also called a linear complex, a 1-complex,

or a one-dimensional complex. A vertex is also referred to as a node, a
junction, a point, 0-cell, or an 0-simplex. Other terms used for an edge are
a branch, a line, an element, a 1-cell, an arc, and a 1-simplex. In this
book we generally use the terms graph, vertex, and edge.
FIGURE 11.25: Edges e and f Lack Common Points
Applications of Graphs
Graph theory has a very wide range of applications in engineering and other
areas. A graph can be used to represent almost any physical situation involv-
ing discrete objects and a relationship among them. The following are four
examples from hundreds of such applications.
Konigsberg Bridge problem
The Konigsberg bridge problem is perhaps the best-known example in graph
theory. It was a long-standing problem until solved by Leonhard Euler (1707-
1783) in 1736, by means of a graph. Euler wrote the first paper in graph
theory and thus became the originator of the theory the rest of topology. The
problem is depicted in Figure
FIGURE 11.26: Konigsberg Bridge Problem
Two islands, C and D, formed by the Pregel River in Konigsberg were

connected to each other and to the banks A and B with seven bridges, as
shown in Figure The problem was to start at any of the four land areas of
the city, A, B, C, or D, walk over each of the seven bridges exactly once, and
return to the starting point (without swimming across the river, of course).
Euler represented this situation by means of a graph, as shown in Figure 11.26.
The vertices represent the land areas and the edges represent the bridges.
c D
FIGURE 11.27: Graph of Königsberg bridge problem
Example 330. (Utilities problem) There are three houses H1 , H2 and H3

to be connected to three utilities: water (W ), gas (G), and electricity (E)
by means of conduits. Is it possible to make such connections without any
crossovers of the conduits?
FIGURE 11.28: Three-Utilities Problem
In Figure 11.28 it shows how this problem can be represented by a graph; the
conduits are shown as edges while the houses and utility supply centers are
vertices. Thus the answer to the problem is no.
Electrical network problems: Properties (such as transfer function and

input impedance) of an electrical network are functions of only two factors:
1. The nature and value of the elements forming the network, such as
resistors, inductors, transistors, and so forth.
2. The way these elements are connected, that is, the topology of the net-
work.
FIGURE 11.29: Graph of Three-Utilities Problem
Since there are only a few different types of electrical elements, the varia-
tions in networks are chiefly due to the variations in topology. Thus electrical
network analysis and synthesis are mainly studies of network topology. In
the topological study of electrical networks, factor 2 is separated from l and
is studied independently. The topology of a network is studied by means of
its graph. In drawing a graph of an electrical network the junctions are rep-
resented by vertices, and branches (which consist of electrical elements) are
represented by edges, regardless of the nature and size of the electrical ele-
ments. An electrical network and its graph are shown in Figures 11.29 (a) and
11.29 (b).
Definition 123. A graph in which all vertices are of equal degree is called a
regular graph (or simply a regular).
Example 331. The graph of three utilities shown in Figure 11.29b is a regular
of degree 3.
Example 332. (Seating problem) Nine members of a new club meet each
day for lunch at a round table. They decide to sit such that every member
has different neighbors at each lunch. How many days can this arrangement
last?
FIGURE 11.30a: Electrical Network and Graph
FIGURE 11.30b: Electrical Network and graph
This situation can be represented by a graph with nine vertices such that
each vertex represents a member, and an edge joining two vertices represents
the relationship of sitting next to each other. Figure 11.31 shows two possible
seating arrangements: l 2 3 4 5 6 7 8 9 l (solid lines) and 1 3 5 2 7 4 9 6 8 I
(dashed lines). It can be shown by graph-theoretic considerations that there
are only two more arrangements possible. They are l 5 7 3 9 2 8 4 6 l and I 7
9 5 8 3 6 2 4 1 . In general it can be shown that for n people the number of
such possible arrangements is (n − 1)/2 if n is odd and (n − 2)/2 if n is even.
Finite and Infinite Graphs

Although in our definition of a graph neither the vertex set V nor the edge
9
/
----r--------- 1
\
\
I \
I \
I \
' / \
7 --J---~L ------\--- 2
' f , \
'i-, /
,: ,)', , . . .
I
......
\\
1/ \ ,
, .........
, ....),,...
--
.............
'<., \ -----
,"" ......
;;:--~~.... \
5 4
FIGURE 11.31: Dinner Table Arrangements
set E need be finite, in most theories and most application sets are finite. A
graph with a finite number of vertices and a finite number of edges is called
a finite graph; otherwise, a graph is infinite. The graphs in Figures 11.23,
11.26, 11.27 and 11.29(b) are all examples of finite graphs. Portions of two
infinite graphs are shown in Figure 11.31
FIGURE 11.32: Portions of Two Infinite Graphs
Isolated Vertex, Pendant Vertex, And Null Graph
Definition 124. A vertex having no incident edge is called an isolated ver-

tex. In other words, isolated vertices are vertices with zero degree.
Example 333. Vertices v4 and v7 in Figure 11.32 are isolated vertices.
Definition 125. A vertex of degree 1 is called a pendant vertex or an end

vertex. The vertex v3 in Figure 11.32 is a pendant vertex.
FIGURE 11.33: Graph Containing Isolated Vertices, Series Edges, and Pen-
dant Vertex
Definition 126. Two adjacent edges are said to be in series if their common
vertex is of degree 2.
Example 334. In Figure 11.32 the two edges incident on v1 are in series.
• 1.7
. £16
FIGURE 11.34: Null graph of Six Vertices
In the definition of a graph G = (V, E), it is possible for the edge set E to
be empty. Such a graph, without any edges, is called a null graph. In other
words, every vertex in a null graph is an isolated vertex. A null graph of six
vertices is shown in Figure 11.33. Although the edge set E may be empty, the
vertex set V must not empty; otherwise, there is no graph. In other words, by
definition, a graph must have at least one vertex.
Application of graph theory to linguistic has attracted attention of fairly
good number of scholars in non-engineering fields, for example, Morgan
Sanderegger, an assistant linguistics professor at McGill University in Mon-
treal [mcgill.ca/∼morgen/rlhijmeGraphs.pdf].
11.7 Applications of Spline Polynomials

In mathematics a spline is a special function defined piecewise by polyno-
mials. In computer science the spline term refers to a piecewise polynomial
curve. A piecewise polynomial function f (x) is obtained by dividing of X
into contiguous intervals, and representing f (x) by a separate polynomial in
each interval. The polynomials are joined at the interval endpoints (knots) in
such a way that a certain degree of smoothness of the resulting function is
guaranteed.
11.7.1 Polynomial Interpolation

In numerical analysis, polynomial interpolation is the interpolation of a
given data set by a polynomial: given some points, find a polynomial which
goes exactly through these points. Given a set of n + 1 data points (xi , yi )
where no two xi are the same, one is looking for a for a polynomial p of degree
at most n with the property p(xi ) = yi , i = 1, 2, . . . , n.
Applications
Polynomials can be used to approximate complicated curves, for example, the
shapes of letters in typography, given a few points. A relevant application is
the evaluation of the natural logarithm and trigonometric functions: pick a
few known data points, create a lookup table, and interpolate between those
data points. This results in significantly faster computations. Polynomial in-
terpolation also forms the basis for algorithms in numerical quadrature and
ordinary differential equations and secure multiparty computation and secret
sharing schemes.
Polynomial interpolation is also essential to perform sub-quadratic mul-
tiplication and squaring such as Karatsuba multiplication and Toom-Cook
multiplication, where an interpolation through points on a polynomial which
defines the product yields the product itself. For example, given a = f (x) =
a0 x0 + a1 x1 + . . . and b = g(x) = b0 x0 + b1 x1 + . . . , the product ab is equiv-
alent to W (x) = f (x)g(x). Finding points along W (x) by substituting x for

small values in f (x) and g(x) yields points on the curve. Interpolation based
on those points will yield the terms of W (x) and subsequently the product ab.
In the case of Karatsuba multiplication this technique is substantially faster
than quadratic multiplication, even for modest-sized inputs. This is especially
true when implemented in parallel hardware.
Definition 127. Polynomial interpolation is a method of estimating values
between known data points. When graphical data contains a gap, but data is
available on either side of the gap or at a few specific points within the gap,
an estimate of values within the gap can be made by interpolation.
The simplest method of interpolation is to draw straight lines between the
known data points and consider the function a combination of those straight
lines. This method, called linear interpolation, usually introduces considerable
error. A more precise approach uses a polynomial function to connect the
points. A polynomial is a mathematical expression comprising a sum of terms,
each term including a variable or variables raised to a power and multiplied
by a coefficient. The simplest polynomials have one variable. Polynomials can
exist in factored form or written out in full. For example:
(x − 4)(x + 2)(x + 10)
x2 + 2x + 1
3y 3 − 8y 2 + 4y − 2.
The value of the largest exponent is called the degree of the polynomial.
If a set of data contains n known points, then there exists exactly one poly-
nomial of degree n − 1 or smaller that passes through all of those points. The
polynomial’s graph can be thought of as “filling in the curve” to account for
data between the known points. This methodology, known as polynomial in-
terpolation, often (but not always) provides more accurate results than linear
interpolation. The main problem with polynomial interpolation arises from the
fact that even when a certain polynomial function passes through all known
data points, the resulting graph might not reflect the actual state of affairs.
It is possible that a polynomial function, although accurate at specific
points, will differ wildly from the true values at some regions between those
points. This problem most often arises when “spikes” or “dips” occur in a
graph, reflecting unusual or unexpected events in a real-world situation. Such
anomalies are not reflected in the simple polynomial function which, even
though it might make perfect mathematical sense, cannot take into account
the chaotic nature of events in the physical universe.
Example 335. Given data points at x0 = 2, y0 = 3 and at x1 = 5, y1 = 8,
find the following: At x = 4, y =?
FIGURE 11.35
TABLE 11.1
x f(x)
2.0 0.85467
2.3 0.75682
2.6 0.43126
2.9 0.22364
3.2 0.08567
P (x) should satisfy the following conditions: P (x = 2) = 3 and P (x =

5) = 8. P (x) can satisfy the above conditions if at x = x0 = 2, L0 (x) = 1 and
L1 (x) = 0 and at x = x1 = 5, L0 (x) = 0 and L1 (x) = 1.
The Lagrange interpolating polynomial passing through three given points;
(x0 , y0 ), (x1 , y1 ) and (x2 , y2 ) is:
At x0 , L0 (x) becomes 1. At all other given data points L0 (x) is 0.
General form of the Lagrange interpolating polynomial
Newton’s Interpolating Polynomials
Newton’s equation of a function that passes through two points (x0 , y0 ) and
(x1 , y1 ) is
P (x) = a0 + a1 (x − x0 ).
The coefficients of Newton’s interpolating polynomial are:
Example 336. Find Newtons interpolating polynomial to approximate a
function whose 5 data points are given below P (2.8) = 0.275.
f(x) First Second Third

divided differences divided diffureoces divided differences
f[xo)
.f[r,. r,.) = .f[r,.)-.f[r, J
r.-r,
.f[r,. r,.. x:.) = .f[r,.. r,)- .f[r,.r,.)
x;_,-r,
f(r,..x;_,)= f(x;,)- f(r.)
x:.-r.
f[x-21
f[x.. )
flxsl
I xi f [xi ] f [xi−1 , xi ] f [xi−2 , xi−1 , xi ] f [xi−3 , xi−2 , f [xi−4 , xi−3 ,

xi−1 , xi ] xi−2 , xi−1 , xi ]
0 2.0 0.85467
-0.32617
1 2.3 0.75682 -1.26505
-1.08520 2.13363
2 2.6 0.43126 0.65522 -2.02642
-0.69207 -0.29808
3 2.9 0.22364 0.38695
-0.45990
4 3.2 0.08567
11.7.2 Spline Interpolation

This method is used to construct a function that crosses through a discrete
set of known data points. In the mathematical field of numerical analysis,
spline interpolation is a form of interpolation where the interpolant is a
special type of piecewise polynomial called a spline.
Given n + 1 distinct knots xi such that: x0 < x1 < · · · < xn , with n + 1 knot
values yi find a spline function with each Si (x) a polynomial of degree at most
FIGURE 11.36
n. 

 S0 (x) x ∈ [x0 , x1 ]
S1 (x) x ∈ [x1 , x2 ]



S(x) := . .
. .




Sn−1 (x) x ∈ [xn−1 , xn ].

Linear spline interpolation: Points connected by lines: each Si is a linear

function constructed as must be continuous at each data point:
yi+1 − yi
Si (x) = yi + (x − xi ).
xi+1 − xi
Quadratic spline interpolation: The quadratic spline can be constructed

as:
zi+1 − zi
Si (x) = yi + zi (x − xi ) + (x − xi )2 .
2 (xi+1 − xi )
The coefficients can be found by choosing a z0 and then using the recurrence
relation
yi+1 − yi
zi+1 = −zi + 2 .
xi+1 − xi
Cubic spline
All splines considered on this page are cubic splines; they are all piecewise
cubic functions. However, anyone who says “cubic spline” usually means a
special cubic spline with continuous first and second derivatives. The cubic
spline is given by the function values in the nodes and derivative values on the
edges of the interpolation interval (either of the first or second derivatives).
• If the exact values of the first derivative in both boundaries are known,
1.2,---,---.-----.-----,------,--,---.-----.----.-----,
-0.2 '----'----'----'----'-----'--'----'----'---'---'
-5 -4 -3 -2 -1 0 2 3 4 5
FIGURE 11.37: Quadratic Spline for f (x) = 1/(1 + x2 )
the spline is called a clamped spline, or spline with exact boundary

conditions. This spline has interpolation error O(h4 ).
• If the value of the first (or second) derivative is unknown, we can set the
so-called natural boundary conditions S 00 (A) = 0, S 00 (B) = 0. Thus, we
get a natural spline with interpolation error O(h2 ). The closer to the
boundary nodes, the greater the error becomes. In the inner nodes the
interpolation accuracy is much better.
• One more boundary condition we can use when boundary derivatives are
unknown is the parabolically terminated spline. The boundary interval
is represented as the second (instead of the third) degree polynomial (for
inner intervals, third degree polynomials are still used). In a number of
cases this provides better accuracy than natural boundary conditions.
• We can also set periodic boundary conditions (this kind of conditions is
used to model periodic functions).
Finally, we can combine different types of boundary conditions for different
boundaries. It does make sense if we have only partial information about the
function behavior at the boundaries (e.g., we know the left boundary deriva-
tive, and have no information about the right boundary derivative).
Akima spline
The Akima spline is a special type which is stable to the outliers. The dis-
advantage of cubic splines is that they may oscillate in the neighborhood of
an outlier. On the graph you can see a set of points having one outlier. On
the intervals which are next to the outlier, the spline noticeably deviates from
the given function because of the outlier. We can see that in contrast to the
cubic spline, the Akima spline is less affected by the outliers. An important
property of the Akima spline is its locality. Function values in [xi , xi+1 ] de-
pend on fi−2 , fi−1 , fi , fi+1 , fi+2 , fi+3 only. The second property which should
be taken into account is the non-linearity of the Akima spline interpolation:
the result of interpolation of the sum of two functions doesn’t equal the sum
of the interpolation schemes constructed on the basis of the given functions.
No fewer than five points are required to construct the Akima spline. In the
inner area (i.e. between x2 and xN −3 when the index goes from 0 to N − 1)
the interpolation error has order O(h2 ).
- Original
- cubic
- Akima
-1 L-r---------~--------~
-1 0
FIGURE 11.38
11.8 Compression Sensing

Compressed sensing (also known as compressive sensing, compressive sam-
pling, or sparse sampling) is a signal processing technique for efficiently ac-
quiring and reconstructing a signal, by finding solutions to underdetermined
linear systems. This is based on the principle that, through optimization, the
sparsity of a signal can be exploited to recover it from far fewer samples than
required by the Shannon-Nyquist sampling theorem. There are two conditions
under which recovery is possible. The first one requires the signal to be sparse
in some domain. The second one is incoherence which is applied through the
isometric property which is sufficient for sparse signals.
A common goal of the engineering field of signal processing is to reconstruct
a signal from a series of sampling measurements. In general, this task is im-
possible because there is no way to reconstruct a signal during the times that
the signal is not measured. Nevertheless, with prior knowledge or assumptions

about the signal, it is possible to perfectly reconstruct a signal from a series
of measurements. Over time, engineers have improved their understanding of
which assumptions are practical and how they can be generalized.
An early breakthrough in signal processing was the Nyquist–Shannon sam-
pling theorem. It states that if the signal’s highest frequency is less than half
of the sampling rate, then the signal can be reconstructed perfectly. The main
idea is that with prior knowledge about constraints on the signal’s frequencies,
fewer samples are needed to reconstruct the signal.
Underdetermined linear system
An underdetermined system of linear equations has more unknowns than equa-
tions and generally has an infinite number of solutions. In order to choose a
solution to such a system, one must impose extra constraints or conditions
(such as smoothness) as appropriate. In compressed sensing, one adds the
constraint of sparsity, allowing only solutions which have a small number of
non-zero coefficients. Not all underdetermined systems of linear equations have
sparse solutions. However, if there is a unique sparse solution to the underde-
termined system, then the compressed sensing framework allows the recovery
of that solution.
Solution reconstruction method
Compressed sensing (CS) takes advantage of the redundancy in many inter-
esting signals—they are not pure noise. In particular, many signals are sparse,
that is, they contain many coefficients close to or equal to zero, when repre-
sented in some domain. This is the same insight used in many forms of lossy
compression.
Compressed sensing typically starts with taking a weighted linear combi-
nation of samples also called compressive measurements on a basis different
from the basis in which the signal is known to be sparse. Therefore, the task
of converting the image back into the intended domain involves solving an
underdetermined matrix equation since the number of compressive measure-
ments taken is smaller than the number of pixels in the full image. However,
adding the constraint that the initial signal is sparse enables one to solve this
underdetermined system of linear equations.
The least-squares solution to such problems is to minimize the L2 norm,
that is, minimize the amount of energy in the system. This is usually sim-
ple mathematically (involving only a matrix multiplication by the pseudo-
inverse of the basis sampled). However, this leads to poor results for many
practical applications, for which the unknown coefficients have non-zero en-
ergy. To enforce the sparsity constraint when solving for the underdetermined
system of linear equations, one can minimize the number of non-zero compo-
nents of the solution. The function counting the number of non-zero compo-
nents of a vector was called the L0 norm.
Applications: The field of compressive sensing is related to several topics
in signal processing and computational mathematics, such as underdeter-

mined linear-systems, group testing, heavy hitters, sparse coding, multiplex-
ing, sparse sampling, and finite rate of innovation. Its broad scope and gener-
ality enabled several innovative CS-enhanced approaches in signal processing
and compression, solution of inverse problems, design of radiating systems,
radar and through-the-wall imaging, and antenna characterization. Imaging
techniques having a strong affinity with compressive sensing include coded
aperture and computational photography. Implementations of compressive
sensing in hardware at different technology readiness levels is available.
Photography: Compressed sensing is used in mobile phone camera sensors.
The approach allows a reduction in image acquisition energy per image by
as much as a factor of 15 at the cost of complex decompression algorithms;
the computation may require an off-device implementation. Image quality im-
proves with the number of snapshots, and generally requires a small fraction
of the data of conventional imaging, while eliminating lens and focus-related
aberrations.
Holography: Compressed sensing can be used to improve image reconstruc-
tion in holography by increasing the number of voxels; it is also used for
image retrieval from under sampled measurements in optical and millimeter-
wave holography.
Facial recognition: Compressed sensing is used in facial recognition appli-
cations.
Magnetic resonance imaging: Compressed sensing has been used to
shorten magnetic resonance imaging scanning sessions using conventional
hardware.
Compressed sensing addresses the issue of high scan time by enabling faster ac-
quisition by measuring fewer Fourier coefficients. This produces a high-quality
image with relatively lower scan time. Another application (also discussed
ahead) is for Computerized Tomography (CT) reconstruction with fewer X-
ray projections. Compressed sensing, in this case, removes the high spatial
gradient parts, mainly, image noise and artifacts. This holds tremendous po-
tential as one can obtain high-resolution CT images at low radiation doses
(through lower current-mA settings).
Network tomography: Compressed sensing has showed outstanding results
in the application of network tomography to network management. Network
delay estimation and network congestion detection can both be modeled as
underdetermined systems of linear equations where the coefficient matrix is
the network routing matrix.
Shortwave-infrared cameras: Commercial shortwave-infrared cameras
based upon compressed sensing are available. These cameras have light sen-
sitivity from 0.9µm to 1.7 µm, which are wavelengths invisible to the human
eye.
11.9 Applications of Lozi Mappings

In 1978, Lozi introduced a two-dimensional map of equations and attrac-
tors of which resemble those of the celebrated Hénon map. Simply, a quadratic
term in the latter is replaced with a piecewise linear contribution in the former.
This allows one to rigorously prove the chaotic character of some attractors
and generate a detailed analysis of their basins of attraction. The equation for
the iterated Lozi map L(x, y) is

1 − a |x| + y
L(x, y) =
bx
or

1 − a |x| + by
L1 (x, y) = .
x
These two formulas are equivalent dynamically, which means that their
dynamical behaviors are the same. The Lozi attractor with its basin of attrac-
tion obtained for a = 1.7052 and b = 0.5896. For details, see [9].
FIGURE 11.39
11.9.1 Lozi Mappings and Secure Communications

First, basic application of chaos theory concerns control of irregular behav-
iors in devices and systems. The list of applications includes but not limited to:
engineering, computers, communications, medicine and biology, management
and finance, consumer electronics, . . . etc. The potential application types of

chaos are: control, synthesis, synchronization and information processing.
Particle swarm optimization (PSO) is a population-based swarm algo-
rithm developed by Eberhart and Kennedy [59], [60].PSO is an evolutionary
optimiation approach based on a population in which the position of each
member or particle is a potential solution to a problem. Each member of
particle is associated with a randomized velocity that moves throughout the
problem space [61].
Araujo and Coelho [62] proposed a chaotic PSO (CPSO) approach inter-
twined with a Lozi map chaotic sequences to develop the Takagi–Sugeno (TS)
fuzzy model to represent dynamical behaviors. Their method was used to op-
timize the premise part of the if-then rules of the TS model that utilizes the
least mean squares technique
As a practical example, the CPSO approach was utilized for a thermal
vacuum system used in space environment emulation and satellite qualifica-
tion. The results succeeded in eliciting a TS fuzzy model for non-linear and
time-delay applications. Future research in this area should investigate the hy-
bridization of the PSO and SPSO and other search methods such as simulated
annealing and quasi-Newton methods.
11.10 Introduction to Maxwell Equations with Applica-

tions
Maxwell’s equations are a set of fundamental relationships which govern
how electric and magnetic fields interact. The equations explain how these
fields are generated and interact and their relationships to charge and cur-
rent. They form the backbone of modern electrical and telecommunication
technology and are often quoted as being the most important equations of
all time. The equations consist of a set containing Gauss’ electric field law,
Gauss’ magnetic field law, Faraday’s law and the Ampere Maxwell law.
1. Gauss’ electric field law relates the distribution of electric charge to

the field the charge creates. If you know the shape of an object and how
the charge is distributed, you can use Gauss’ law to find an expression
for the electric field. This law is generally used when there’s a degree of
symmetry, making the equation simpler.
2. Gauss’ magnetic law states that magnetic monopoles do not exist.

It’s really more of a statement than a formula we might use to derive
expressions. Charges exist as positive or negative. But in magnetism,
whenever you have a south pole, you also have a north pole. No single
poles, or monopoles have been yet discovered.
3. Faraday’s law states that any change to the magnetic environment of a

coil of wire will cause a voltage to be induced in the coil. If the magnetic
field strength changes, the magnet moves, the coil moves, or the coil is
rotated, the change will create a voltage in the coil.
4. Ampere Maxwell law says that the magnetic field created by an
electric current is proportional to the size of that electric current, with
a constant of proportionality equal to the permeability of free space.
Stationary charges produce electric fields, proportional to the magnitude
of that charge. But moving charges produce magnetic fields, proportional
to the current (the charge and movement).
(a) ∇ · D = ρV
(b) ∇ · B = 0
(c) ∇ × E = − ∂B
∂t
∂D
(d) ∇ × H = ∂t +J
where D, B, E, H and J are explained in Section 3.8.

Applications of Maxwell Equations
FIGURE 11.40
The uses and applications of Maxwell’s equations are too many to count. By
understanding electromagnetism we’re able to create images of the body using
MRI scanners; we’ve created magnetic tape, generated electricity, and built
computers. Any device that uses electricity or magnets is on a fundamental
level built upon the original Maxwell equations. To describe the weak force,
physicists drew analogies to electromagnetism, and eventually found them-
selves a step higher up the unification ladder. Their ideas suggested that the
two forces were, in fact, just two sides of the same coin: the unified elec-
troweak force. The Universe would be completely different if the weak force
were not weak. The idea of unification suggests that the similarity of the two
forces, electromagnetism and the weak force, was only apparent right after the
Big Bang, when the Universe was incredibly hot. As temperatures cooled, the
forces crystallized and became different. Weird as it might seem, the concept
isn’t entirely unfamiliar: think of the dramatic change that happens to water
when it freezes to ice.
The laws of nature responsible for the behavior of water are the same
everywhere and do not favor any particular direction in space. This is why a
patch of ocean looks much like any other, and appears the same no matter
from what direction you look at it. The icebergs that form when the water
freezes, however, display none of that symmetry: no two look the same. Those
with rotational symmetry are extremely rare. As the Universe cooled, the
messenger particles of the weak force (and other particles) acquired mass,
while the messenger particles of electromagnetism remained massless.
Lightning is electrostatic discharge between electrically charged regions of
a cloud during an electrical storm. Electroweak unification was a real triumph
of theoretical physics and led to Nobel Prizes in Physics for Sheldon Glashow,
Abdus Salam, and Steven Weinberg for the unified electroweak framework,
and for François Englert and Peter Higgs for the description of the mass-
related symmetry breaking mechanism. For detailed study we refer to Neun-
zert and Siddiqi [35] and Monk [31].
11.11 Stochastic Calculus for Engineering Problems

Stochastic calculus is a branch of mathematics that operates on stochastic
processes. It allows a consistent theory of integration to be defined for inte-
grals of stochastic processes and is used to model systems that behave ran-
domly. The main types of stochastic calculus are the Itô calculus and its
variational relative known as the Malliavin calculus. For technical reasons the
integral is the most useful for general classes of processes. We present here the
Itô lemma. Interested readers may pursue references [19], [30], [50] and [54]
for detailed study.
Brownian Motion
Brownian motion also called Brownian movement involves various physi-
cal phenomena in which some quantity is constantly undergoing small, random
fluctuations. It was named for the Scottish botanist Robert Brown, the first
scientist to study such fluctuations (1827).
Example 337. 1. The random motion of particles suspended in a fluid (a liq-

uid or a gas) resulting from their collision with the fast-moving atoms or mole-
cules in the gas or liquid.
2. Random Walk
FIGURE 11.41: Random Walk
Brownian motion is among the simplest of the continuous-time stochastic (or

probabilistic) processes, and serves as a limit of simpler and more complicated
stochastic processes. This universality is closely related to the universality of
the normal distribution. In both cases, mathematical convenience rather than
the accuracy of the models motivates their use.
Brownian motion is a random continuous time process denoted X(t) or
Xt , defined for t ≥ 0 such that W (0) takes some predetermined value, usually
0, and for each 0 ≤s < t, W (t)−W (s) has a normal distribution with mean
µ(t−s) and variance σ 2 (t−s). The parameters µ and σ are the drift and the
diffusion parameters of the Brownian motion and in the special case µ = 0
and σ = 1, W (t) are often referred to as a standard Brownian motion or a
Wiener process. Another property of the Brownian motion process that the
sample paths are continuous functions of t with probability 1.
11.11.1 Stochastic Integration

The stochastic integral arose from attempts to use the techniques of
Riemann-Stieltjes integration for stochastic processes. However, Riemann in-
tegration requires the integrating function to have locally bounded variation
in order that the Riemann-Stieltjes sums converge.
We define a stochastic integral as:

Z t n
X
W (t) = f (τ ) dX(τ ) = lim f (tj−1 )(X(tj ) − X(tj−1 ))
0 n→∞
j=1
with
jt
tj = .
n
The integration is evaluated in the summation at the left-hand point tj−1. It
is important that each function evaluation does not know about the random
increment that multiplies it, i.e., anticipate integration is non-anticipatory.
11.11.2 Stochastic Differential Equations

We say that (Xt ) satisfies the stochastic integral equation:
Z t Z t
Xt = X 0 + µ(s, Xs ) ds + σ(s, Xs ) dBs
0 0
or in differential form:
dXt = µ(t, Xt )dt + σ(t, Xt )dBt
where µ and σ so that the integrals make sense.
Itô’s lemma
dF 1 d2 F
dF = dX + dt,
dX 2 dX 2
where F (X) is a function of Brownian motion X(t).
Application: An important application of stochastic calculus is in quanti-

tative finance, in which asset prices are often assumed to follow stochastic
differential equations. In the Black-Scholes model, prices are assumed to fol-
low geometric Brownian motion.
A fine account of stochastic calculus useful for financial engineering is
presented in Wilmot [54]. It may be observed that ordinary rules of calculus
do not generally hold in a stochastic environment. For example: If F (X) = X 2
then dF = 2XdX in calculus, but dF = 2XdX + dt in stochastic calculus.
This follows from Itô’s lemma:
F (X) = X 2
dF d2 F
= 2X, = 2.
dX dX 2
We obtain the result, dF = 2XdX +dt. For a comprehensive account, see [19],
[30], [50] and [54].
11.12 Exercises
11.1. (a) Write the fractal dimensions of well-known fractals like
(i) Cantor set
(ii) Sierpinski triangle
(iii) Von Koch curve
(b) Can you give examples of two physical situations which are not
necessarily similar (same) but have the same fractal dimension?
11.2. Write a note on irregular geometry.
11.3. Explain the concept of IFS with a concrete example.
11.4. Define the Hausdorff metric. Is Hausdorff metric space is a metric space.
11.5. Discuss connections between fractal, Lorenz attractor, Strange attractor,
Rossler attractor and Henón attractor.
11.6. Write an essay on relations between fractals and chaos.
11.7. Is it possible to use fractals for weather forecasting or predictions of
earthquakes and tsunamis?
11.8. Give examples of time series representing physical phenomena.
11.9. What is the Hurst exponent? What is the relation between Hurst param-
eter and fractal dimension? Who initiated the study of fractals? What
do you know about multifractals?
11.10. Give example of feedforward networks.
11.11. Prove that the number of vertices of odd degrees in a graph is always
even.
11.12. Indicate areas of science, social science, linguistics, engineering and tech-
nology where graph theory has been used. Is there any relevance of graph
theory to big data?
11.13. Draw simple graphs of one, two and three vertices.
11.14. Show that the maximum number of edges in a simple graph with n
vertices is n(n − 1)/2.
11.15. Let X(t) denote Brownian motion then show that:
Rt
(i) 0 X(τ )dX(τ ) = 12 X 2 (t) − 12 t
Rt Rt
(ii) 0 f (τ )dX(t) = f (t)X(t) − 0 X(τ )df (τ ), where f (t) is bounded and
continuous function on [0,1].
Relation (ii) is known as the integration by parts of stochastic calculus.
11.16. Let B-splines be defined as
B0 (x) = χ[−1/2,1/2] (x)
and for any natural number n define

Z x+1/2
Bn (x) = Bn−1 ∗ B0 (x) = Bn−1 (t)dt
x−1/2
where ∗ denotes convolution. Show that B1 (x) = χ[ 0, 1] and B2 (x) is a

piecewise polynomial of degree 2.
11.17. A spline is a function f : R → R for which one can divide R into
intervals in such a way that f is a polynomial on each interval. The
points at which function changes from one polynomial to another are
called knots. For example
if x2 ∈ (−∞, 0)

 0,
2
if x2 ∈ (0, 1)

 2x ,

f (x) = 2 − x, if x2 ∈ (1, 4) .
1


3 2
x , if x ∈ (4, ∞)


16
Find knots. Show that B− splines are splines.
11.18. Find the Fourier transform of B1 (x) (B− spline of order 1).
11.19. Explain how splines can be used in finite element methods.
11.20. Write an essay on B-splines and wavelet expansions.

Partial differential equations provide models for important problems in
natural and medical sciences and all engineering fields. A natural question
concerns phenomena modelled by partial differential equations on fractals such
as Cantor set, Sierpinski gasket and Von Koch curve. In this connection we
recommend Mosco [32], Pelander [38] and Strichartz [46] through [49]. For an
interesting introduction of the Sierpinski gasket we suggest Stewart [45]. We
recommend Falconer [10], Nekka and Li [33,34], Siddiqi [44] and Triebel [51]
for mathematical foundations and effective applications.
Applications of fractal dimension analysis and predictions of meteorologi-
cal data of India, Saudi Arabia and Mexico are presented respectively in Ran-
garajan [39], Siddiqi [43], Rehman and Siddiqi [42] and Velasquenz et al. [53].
A study of these references may be helpful to carry out further investigation

in this field.
For application of fractals to earth sciences, specially to earthquake and
tsunami prediction we refer to Dimri [6,7] and Turcotte [52]. For a thorough
study of stochastic calculus we refer to Klebaner [19], Shreve [50] and Wilmot
[54]. For deeper study of splines we recommend Holling [15] and Christensen
[53, Chapter 10]. For detail study we refer to Neunzert and Siddiqi[35] and
Monk [31]. A wide range of emerging topics useful for engineers have been
introduced in this chapter. For extensive discussion we refer to [2] through [5],
[9], [11] through [14], [16], [20], [23] through [25], [36], [37] and [55].
Bibliography
[1] P. S. Addison, The Illustrated Wavelet Transform Handbook, Institute of

Physics Publishing, Philadelphia, 2002.
[2] A. Z. Averbuch, Spline and Spline Wavelet Methods with Applications to

Signal and Image Processing, Volume I, Springer, 2014.
[3] M. Brokate and A. H. Siddiqi, Functional Analysis with Current Ap-
plications in Science, Technology and Industry, Second edition, Long-
man/CRC Press, 2014.
[4] J. Chakravarty, Introduction to MATLAB Programming, Tool Box and
Simulink, Universities Press, 2014.
[5] O. Christensen, Functions, Spaces and Expansions , Birkhauser, 2010.
[6] N. Deo, Graph Theory with Applications to Engineering and Computer

Science, Dover Publications, 2016.
[7] V. P. Dimri, Application of Fractals in Earth Sciences, Oxford and IBH
Publishing, Calcutta, 2000.
[8] V. P. Dimri (Ed.) Fractal Behaviour of the Earth System, Springer, 2005.
[9] Z. Elhadj, Lozi Mappings: Theory and Applications, CRC Press, 2013.
[10] K. Falconer, Fractal Geometry: Mathematical Foundations and Applica-
tions, Second Edition, Wiley-Blackwell, 2003.
[11] D. P. Feldman, Chaos and Fractals: An Elementary Introduction, Oxford
University Press, 2012.
[12] E. Foufoula-Georgiou and P. Kumar (Eds), Wavelets in Geophysics, Aca-
demic Press, San Diego, 1994.
[13] K. M. Furati, Z. Nashed and A. H. Siddiqi, Mathematical Models & Meth-
ods for Real World Systems, Chapman & Hall, New York, 2006.
[14] Igor, Gorban, Randomness and Hyper-Randomness, Springer, 2017.
[15] K. Holling, Finite Element Methods with B.Splines, Applied Mathemat-
ics, Philadelphia, 2003.
727
728 Bibliography
[16] M. Javadov and A. Kh. Janahmadov, Synergetics and Fractals in Tribol-

ogy, Springer, 2016.
[17] J. Kigami, A harmonic calculus on the Sierpinski spaces. Japan J Appl.

Math. 6, 259-290. 1989.
[18] J. Kigami, Analysis on Fractals, Cambridge University Press, 2001.
[19] F. C. Klebaner, Introduction to Stochastic Calculus with Applications,
Imperial College Press, 2005.
[20] B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Systems
Approach To Machine Intelligence, Prentice-Hall, 2009.
[21] J. Kumar and E. Foufoula-Georgiou, Wavelet analysis for geophysical
applications, Rev. GeoPhysics, 35, 385-412, 1997.
[22] J. Kumar, P. Manchanda and N. A. Sontake, Wavelet methods for In-
dian rainfall data, In Mathematical Models and Methods for Real World
Systems, Chapman & Hall, New York, 2006, pp. 179-210.
[23] R. S. T. Lee, Fuzzy-Neuro Approach to Agent Applications, Springer,
2008.
[24] P. Manchanda, J. Kumar, F. Khene and A. H. Siddiqi, Wavelet based mul-
tifractal analysis of Indian rainfall data, In Modern Mathematical Models,
Methods and Algorithms for Real World Systems, Anamaya and Anshan,
www.anshan.co.uk, 2006.
[25] B. B. Mandelbrot, Fractals: Form, Chance and Dimension. Freeman,

1977.
[26] B. B. Mandelbrot and J. Van Ness, Fractional Brownian motion, frac-
tional noises and applications, SIAM Rev. 10: 422-427, 1968.
[27] B. B. Mandelbrot and R.L. Hudson, The (Mis)Behaviour of Markets,

Basic Books, New York, 2004.
[28] Y. Meyer, Wavelet Vibrations and Scalings, Vol. 9, CRM Monograph
Series AMS, 1998.
[29] Y. Meyer (with S. Jaffard and R. D. Ryan), Wavelets: Tools for Science
and Technology, SIAM, Philadelphia, 2001.
[30] G. Mircea, Stochastic Calculus, Applications in Science and Engineering,
Springer, 2002.
[31] P. Monk, Finite Element Methods for Maxwell’s Equations, Oxford Sci-
ence Publications, 2003.
Bibliography 729
[32] U. Mosco, Energy functional on certain fractal structures, J. Convex.

Anal. 9: 581-600, 2002.
[33] F. Nekka and J. Li, Introduction of triadic Cantor sets with their trans-
lates. Fundamental properties, Chaos, Solitons Fractals, 13(9): 1807-1817,
2002.
[34] F. Nekka and J. Li, A continuous translation based method to reveal the
fine structure of fractal sets, Arabian Journal for Science and Engineer-
ing, 28, Part I, 169-188, 2003.
[35] H. Neunzert and A. H. Siddiqi, Topics is Industrial Mathematics Case
Studies and Related Mathematical Methods, Kluwer, 2000.
[36] A. M. Ovrutsky and A. S Prokhoda, Computational Materials Science:
Surfaces, Interfaces, Crystallization, Kindle Edition, Elsevier, 2013.
[37] B. Oksendal, Stochastic Differential Equations: An Introduction with Ap-
plications, Springer, reprint, 2003.
[38] A. Pelander, A Study of Smooth Functions and Differential Equations on
Fractals, Dissertation, Uppsala university, Sweden, 2007.
[39] G. Rangarajan, and D.A. Sant, Fractal dimension analysis of Indian cli-
matic dynamics, Chaos, Solitons Fractals 19: 285-291, 2004.
[40] G. Rangarajan, M. Ding (Eds.) Processes with Long-Range Correlations
Theory and Applications, Springer-Berlin, Heidelberg and New York,
2003.
[41] S. Rehman, A. H. Siddiqi, Wavelet-based correlation coefficient of time
series of Saudi meteorological data, Chaos, Solitions Fractals, 39: 1764-
1789, 2009.
[42] S. Rehman, A. H. Siddiqi, Wavelet-based Hurst exponent and fractal di-

mensional analysis of Saudi climatic dynamics, Chaos, Solitons, Fractals
, 40: 1081-1090, 2009.
[43] A. H. Siddiqi, Research Project on Wavelet and Fractal Methods for the
Analysis of Meteorological Data, King Fahd University of Petroleum and
Minerals, Dhahran Saudi Arabia, 2006.
[44] A. H. Siddiqi, Theme issues: wavelet and fractal methods is science and
engineering, Arab. J. Sci. Eng., 28-29, 2003-2004.
[45] I. Stewart, Four encounters with Sierpinski Gasket, Math. Intel., 17: 52-
64, 1995.
[46] R. S. Strichartz, Fratafolds based on the Sierpinski gasket and their spec-
tra, Trans. Amer. Math. Soc. 355: 4019-4043, 2003.
730 Bibliography
[47] R. S. Strichartz, Analysis on products of fractals, Trans. Amer. Math.

Soc. 357: 571-615, 2005.
[48] R. S. Strichartz, Laplacians on fractals with spectral gaps have nicer
Fourier series, Math. Res. Lett. 12: 269-274, 2005.
[49] R.S. Strichartz, Differential Equations on Fractals: A Tutorial, Princeton
University Press, 2006.
[50] S. Shreve, Stochastic Calculus for Finance II: Continuous-Time Models,
Springer, reprint, 2010.
[51] H. Triebel, Fractals and Spectra. Monographs in Mathematics 91,
Birkhäuser, 1997.
[52] D. L. Turcotte, Fractals and Chaos in Geology and Geophysics, Second
Edition, Cambridge University Press, 1997.
[53] M. A. Valle Velasquenz, G. M. Garcia, I. S. Cohon, L. K. Oleschko, J.
A. R. Correl and G. Koruin, Spatial variability of Hurst exponent for
the daily scale rainfall series in the state of Zacatecas Mexico, J. Am.
Meterol. Soc., 2771-2780, 2013.
[54] P. Wilmot, Derivatives, The Theory and Practice, John Wiley & Sons,
Chichester, 1998.
[55] Y. W. Biao Biao et al., Fuzzy logic and neuro-fuzzy Systems: a systematic
introduction, Int. J. of Artificial Intel. Expert Sys. (IJAE), 2(2), 2011.
[56] A. Areodo, E. Bacry and J. Muzy, Solving the inverse fractal problem
from wavelet analysis, European Physics Letter, 25: 7, 479-556, 1994.
[57] S. Mallat, Wavelet Tour of Dignal Processing, Academic Press, 1998.
[58] E. Magosso, M. Ursino, A. Zaniboni et al., A wavelet based energetic
approach for the analysis of biomedical signals, J. Appl. Math. Comput.,
207:42-62, 2009.
[59] R. Eberhart and J. Kennedy, A new optimizer using particle swarm the-
ory, MicroMachine and Human Science, IEEE, DOI:101109/MHS 1995.
[60] J. Kennedy and R. Eberhart, Particle swam optimization, Neural Net-
works Procceding IEEE Conference, DOI:101109/ICNN1995488968, IEE,
1995.
[61] D. E. Goldberg, Genetic Algorithm in Search, Optimization and Machine
Learning, Reading, M. A. Addison Wesley, 1989.
[62] E. Araujo, L. S. Coelho, Particle swam approaches using Lozi map
Chaotic sequences to fuzzy modelling of an experimental thermal-vacuum
system, Applied Soft Computing 8: 1354-1364, 2008.
Bibliography 731
[63] M. Hassoun, Fundamentals of Artificial Neural Networks, MIT Press,

2003.
Appendix A
Basic Concept of Calculus
A.1 Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733

A.2 Intervals, Absolute Value and Inequalities . . . . . . . . . . . . . . . . . . . . . . . 735
A.3 Binomial Formula and Quadratic Formula . . . . . . . . . . . . . . . . . . . . . . 735
A.4 Analytic Geometry and Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
A.5 Logarithmic and Exponential Functions . . . . . . . . . . . . . . . . . . . . . . . . . 743
A.6 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
A.7 Integration Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
A.8 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
A.9 Definition of Multiple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
A.1 Number System

The simplest numbers are the natural numbers 1, 2, 3, 4, 5, . . . . The num-
bers . . . , −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, . . . are called integer numbers or
simply integers. We denote the set of all natural numbers by N and the set
of all integers by Z. It is clear that N ⊂ Z, that is, the set of natural numbers
is a subset of the set of integers.
p
The numbers of the form , where p and q are integers with q 6= 0, are
q
3 −2 11
called rational numbers. For example , , are rational numbers. We
4 5 2
denote the set of all rational numbers by Q. We have Z ⊂ Q, since any integer
p
p can be written as the ratio .
1
Numbers which cannot√be written √ √ as the ratio of two integers are called
irrational. For example 2, 1 + 2, 5√and π are irrational numbers. (In
Section 1.6 we present Euclid’s proof that 2 is irrational.) Together, rational
and irrational numbers form what is called the real number system. Thus,
a real number is either rational or irrational. The set of all real numbers is
denoted by R, and Q ⊂ R.
We assume that the reader is familiar with the algebraic operations (ad-
dition, subtraction, multiplication and division) for numbers. Recall that di-
vision by zero is not possible.
Since the square x2 of a real number x is non-negative, there cannot be any
real number which satisfies the equation x2 = −1, or x2 + 1 = 0. Nevertheless,
733
it
√ has been found very useful in mathematics to introduce the number i =
−1, so that i2 = −1. i is called the imaginary unit. Numbers of the form
a + bi, where a and √ b are real numbers, are called complex numbers. For
example, 2 + 5i and 3 − 6i are complex numbers. A complex number a + bi
is called imaginary (or pure imaginary) if a = 0, so 9i is an imaginary
number. Since a = a+0i, every real number a can be interpreted as a complex
2
number, as for example , thus R ⊂ C. The algebraic operations are defined
3
for complex numbers in the natural way, for example
√ √ √
(3 + 5i) + (4 − 2i) = 3 + 4 + 5i − 2i = 7 + (5 − 2)i ,
√ √ √
(3 + 5i) − (4 − 2i) = 3 − 4 + 5i − (− 2i) = −1 + (5 + 2)i ,
√ √ √
(3 + 5i) · (4 − 2i) = 3 · 4 + 5i · 4 − 3 2i − 5i · 2i
√ √
= (12 + 5 2) + (20 − 3 2)i .
The division of two complex numbers is a bit more complicated, in general

one has
a + bi (a + bi)(c − di) ac + bd bc − ad
= = 2 + 2 i.
c + di (c + di)(c − di) c + d2 c + d2
Rational and irrational numbers can be distinguished by their decimal repre-
sentations. Decimals like 12.345 = 12.3450000. . . , where only zeroes appear
from some point onward, are called terminating decimals. They correspond
to those rational numbers p/q whose denominator q has only 2’s and 5’s as
prime factors. Every other rational number has a periodic decimal represen-
tation, that is, a representation where a certain string of numbers is repeated
from some point onward, for example
31
= 0.51666666 . . . ,
60
13
= 0.7647058823529411 76470 . . . .
17
Irrational numbers have non-periodic decimal representations. For example,
the decimal representations
π = 3.141592653 . . . ,
√
2 = 1.414213562 . . . .
do not exhibit any periodic repetition. Moreover, if we truncate a non-

terminating decimal representation (periodic or non-periodic) of a real number
x at some point, the resulting terminating decimal will be only an approxi-
mation to x.
Basic Concept of Calculus 735
A.2 Intervals, Absolute Value and Inequalities

We say that a real number x is less than another real number y, and write
x < y, if x − y is less than zero, that is, x − y is negative or y − x is positive.
Writing x ≤ y (read “x is less than or equal to y”) means that x − y is less
than or equal to 0, that is, x − y is negative or equal to 0.
The set of all real numbers x such that a ≤ x ≤ b is called the closed
interval from a to b and denoted as [a, b]. The set of all real numbers x such
that a < x < b is called the open interval from a to b and denoted as (a, b).
Definition A.1. The absolute value or magnitude of a real number x is
denoted by |x| and is defined as

x ,
 x > 0,
|x| = 0 , x = 0,

−x , x < 0 .

Properties of the absolute value. For any real numbers x and y, |x − y|

represents the distance of x and y on the real line. (Thus |x| equals the distance
of x from 0.) Moreover, the following properties hold.
√
(a) x2 = |x|.
(b) | − x| = |x|. (A number and its negative have the same absolute value.)
(c) |xy| = |x||y|. (The absolute value of a product is the product of the
absolute values.)
(d) |x + y| ≤ |x| + |y|. (This is called the triangle inequality.)
(e) ||x| − |y|| ≤ |x − y|. (This is called the inverse triangle inequality.)
A.3 Binomial Formula and Quadratic Formula

Binomial Formulas
n(n−1) n−2 2 n(n−1)(n−2) n−3 3

(a) (x + y)n = xn + nxn−1 y + 1.2 x y + 1.2.3 x y + ... +
nxy n−1 + y n .
n(n−1) n−2 2 n(n−1)(n−2) n−3 3
(b) (x − y)n = xn − nxn−1 y + 1.2 x y − 1.2.3 x y + ... ±
nxy n−1 + ∓y n .
Quadratic Formula The solution of the quadrat equation ax2 + bx + c = 0

is √
−b ± b2 − 4ac
x= .
2a
A.4 Analytic Geometry and Trigonometry

Rectangular coordinates. The real line consists of all real numbers;
each point on the real line is associated with a real number. Let us now con-
sider the plane. A rectangular coordinate system (also called Cartesian
coordinate system) consists of two perpendicular coordinate lines, called
coordinate axes. See Figure A.1 The intersection of the two axes is called the
origin of the coordinate system and denoted by O. Usually, the coordinate
axes are chosen along the horizontal and vertical direction; the horizontal axis
is called the x-axis, the vertical axis is called the y-axis. In this case the plane
and the axis together are referred to as the xy-plane. Any point P in the plane
3
2
-3 -2 ·1 0 1 2 3
-1
-2
-3
FIGURE A.1: Rectangular Coordinate System
is represented by a pair (x1 , y1 ) of real numbers, x1 is called the x-coordinate

of P and y1 is called the y-coordinate of P . Points on the x-axis have the form
(x1 , 0), while points on the y-axis have the form (0, y1 ). The origin O has the
coordinates (0, 0). The vertical line through a point (x1 , 0) and the horizontal
line through (0, y1 ) are represented, respectively, by the equations
x = x1 and y = y1 .
Lines, distances, circles. Let P1 and P2 be two points in the xy-plane,

having coordinates (x1 , y1 ) and (x2 , y2 ) respectively. Consider the straight
line passing through P1 and P2 . Whenever x1 6= x2 ,
y2 − y1
m=
x2 − x1
is called the slope of this line; the line is described by the equation
y = m(x − x1 ) + y1 .
The distance between P1 and P2 is defined by

p
d(P1 , P2 ) = d((x1 , y1 ), (x2 , y2 )) = (x2 − x1 )2 + (y2 − y1 )2 .
Indeed, this number yields the length of the line segment which connects P1
and P2 , by the theorem of Pythagoras. As a particular case, the distance of
P1 = (x1 , 0) and P2 = (x2 , 0) equals
p
d(P1 , P2 ) = (x2 − x1 )2 = |x2 − x1 | .
The equation of a circle with center (x0 , y0 ) and radius r is given by
(x − x0 )2 + (y − y0 )2 = r2 .
If (x0 , y0 ) = (0, 0) the equation becomes x2 + y 2 = r2 and describes the circle

centered at the origin with radius r.
Polar coordinates. Instead of Cartesian coordinates, it is often more conve-
nient to use polar coordinates (Figure A.2) which we usually denote by (r, θ).
The Cartesian coordinates (x, y) and the polar coordinates (r, θ) of a given
point P in the plane are related by
x = r cos θ , y = r sin θ . (A.1)
The number r equals the length of the line segment joining the origin O and
the point P , and θ equals the angle between the line OP and the x-axis. Note
that θ is usually measured in radians, that is, an angle of 90o has θ = π/2, the
full angle of 360o has θ = 2π and so on. Formula (A.1) expresses the Cartesian
coordinates of P in terms of the polar coordinates of P Conversely, we obtain
the polar coordinates of P from its Cartesian coordinates by the formula
p y
r = x2 + y 2 , θ = tan−1 . (A.2)
x
Trigonometric functions
The trigonometric functions are directly related to the geometry of the circle.
........................................ p(x,y )
e
M
0 X X
FIGURE A.2: Polar Coordinates
Y"
FIGURE A.3: Trigonometric Functions and Circle

Consider the point P with Cartesian coordinates (x, y) and polar coordinates
(r, θ) as in Figure A.3. The standard trigonometric functions are defined as
follows.
y
sin θ = (read as “sine of θ”),
r
x
cos θ = (read as “cosine of θ”),
r
and analogously
y x r r
tan θ = , cot θ = , sec θ = , csc θ = ,
x y x y
called the tangent, the cotangent, the secant and the cosecant, respectively.
From elementary geometry we see that the definitions above do not depend
on the chosen value of r, as long as r > 0, so it suffices to consider the case
r = 1 (the unit circle) See Figure A.4.
Recall that θ is usually measured in radians (θ = π/2 corresponds to an
angle of 90o and so on).
Trigonometric Identities
A trigonometric identity is an equation involving trigonometric functions that
is true for all angles for which both sides of the equation are defined. We state
a few useful trigonometric identities.
{0,1)
l
tan x
0 B A
!
FIGURE A.4: Trigonometric Functions and the Circle
cos2 θ + sin2 θ = 1 . (A.3)

2 2
1 + tan θ = sec θ . (A.4)
2 2
1 + cot θ = csc θ . (A.5)
sin (θ + 2π) = sin (θ − 2π) = sin θ . (A.6)
cos (θ + 2π) = cos (θ − 2π) = cos θ . (A.7)
In general,
sin (θ ± 2nπ) = sin θ , for n = 0, 1, 2, 3, . . . ,

(A.8)
cos (θ ± 2nπ) = cos θ , for n = 0, 1, 2, 3, . . . .
tan (θ + π) = tan θ , tan (θ − π) = tan θ . (A.9)
Next we give formulas which involve sums or multiples of angles:
sin (α + β) = sin α cos β + cos α sin β . (A.10)

cos (α + β) = cos α cos β − sin α sin β . (A.11)
sin 2θ = 2 sin θ cos θ . (A.12)
cos 2θ = cos2 θ − sin2 θ . (A.13)
2 tan θ
tan 2θ = . (A.14)
1 − tan2 θ
cos 2θ = 2 cos2 θ − 1 . (A.15)

2
cos 2θ = 1 − 2 sin θ . (A.16)
tan α + tan β
tan (α + β) = . (A.17)
1 − tan α tan β
tan α − tan β
tan (α − β) = . (A.18)
1 + tan α tan β
α+β α−β
sin α + sin β = 2 sin cos . (A.19)
2 2
α+β α−β
sin α − sin β = 2 cos sin . (A.20)
2 2
α+β α−β
cos α + cos β = 2 cos cos . (A.21)
2 2
α+β α−β
cos α − cos β = −2 sin sin . (A.22)
2 2
Theorem A.1. We have
sin x π
cos x < < 1, for any x satisfying 0 < |x| ≤ . (A.23)
x 2
Geometric proof. Assume first that x > 0. The proof is based on the rela-
tions between certain areas in Figure A.4. For a circle of radius 1, the figure
shows a sector of angle x with corners O, A and P , and additional auxiliary

points B and Q, connected to P and A by vertical lines of length sin x and
tan x respectively. The distance between O and B equals cos x. We then have
1
area of triangle OAP = · 1 · sin x ,
2
1
area of sector OAP = x ,
2
1
area of triangle OAQ = · 1 · tan x .
2
We see that
1 1 1 1 sin x
sin x < x < tan x = ,
2 2 2 2 cos x
since the triangle OAP is contained in the sector OAP , which in turn is
contained in the triangle OAQ. Multiplying with 2 and dividing by sin x results
in
1 x 1
< < .
sin x sin x cos x
Using reciprocals yields the assertion (A.23) for x > 0. For x < 0 it is enough
to observe that
sin(−x) − sin x sin x
cos(−x) = cos x , = = .
−x −x x
An alternative proof, for small values of x uses the power series repre-
sentation of the sine and the cosine. We have
x3 x5
sin x = x − + − ....
3! 5!
Therefore
sin x x2 x4
=1− + − ... . (A.24)
x 3! 5!
On the other hand,
x2 x4
cos x = 1 − + − ... . (A.25)
2! 4!
If we consider just the first two terms of the series, we get for x > 0
x2 x2
1− <1− < 1. (A.26)
2! 3!
If x is small enough, inequality (A.26) continues to hold even after we add
the remaining terms of the series (A.24) and (A.25), because those remaining
terms have exponents of x greater than 2, and hence the remaining sums have
the form x2 r(x) for some function r with limx→0 r(x) = 0.
Inverse Trigonometric Functions

The inverse trigonometric functions or cyclometric functions are ob-

tained as inverse functions of the trigonometric functions. However, to ensure
that this works out correctly in the sense of the general definition of an inverse
function, one has to restrict the domain of the original function to an interval
where the latter is increasing resp. decreasing. For example, the sine function
is increasing on the interval [−π/2, π/2]; thus by a theorem it has an inverse
defined on its range [−1, 1] with values in [−π/2, π/2]. This inverse function
is called an arcsine; more precisely, it is called the principal branch of
the arcsine. Indeed, on the interval [π/2, 3π/2] the sine is decreasing (again
with range [−1, 1]), so we could also define an inverse on [−1, 1] with range
[π/2, 3π/2], and similarly on other intervals.
In Table A.1 we list the principal branches of the six standard trigonomet-
ric functions. Their names are obtained by putting the syllable “arc” in front
of the original function.
TABLE A.1
Name Standard Domain Range in Range in

notation radians degrees
arcsine y = arcsin(x) x ∈ [−1, 1] −π/2 ≤ y ≤ π/2 −90o ≤ y ≤ 90o
arccosine y = arccos(x) x ∈ [−1, 1] 0≤y≤π 0o ≤ y ≤ 180o
arctangent y = arctan(x) x∈R −π/2 < y < π/2 −90o < y < 90o
arccotangent y = arccot(x) x∈R 0<y<π 0o < y < 180o
x≥1 0 ≤ y < π/2 0o ≤ y < 90o

arcsecant y = arcsec(x) or or or
x ≤ −1 π/2 < y ≤ π 90o < y ≤ 180o
x ≤ −1 −π/2 ≤ y < 0 −90o ≤ y < 0o
arccosecant y = arccsc(x) or or or
x≥1 0 < y ≤ π/2 0o < y ≤ 90o
The inverse function notations sin−1 , cos−1 etc. are natural, but one must
be aware of the following notational conflict. Since we commonly write sin2 x
instead of (sin x)2 , an unspecified use of sin−1 (x) might mean arcsin(x) as well
as 1/(sin x).
Inverse Hyperbolic Functions

The inverse hyperbolic functions are the inverses of the hyberbolic func-
tions sinh, cosh, tanh, coth, sech and csch. Similarly as in the case of trigono-
metric functions, one has to consider suitable domains and ranges in order to
define inverses in the sense of definition of the inverse problem (In contrast to
the trigonometric functions, this is not always necessary; sinh is invertible on
its entire domain R.) They are called area hyperbolic functions, and for
the mathematical notation the prefix “ar” precedes the name of the original
function, for example “arsinh” denotes the inverse of sinh. Table A.2 lists the
principal branches of the inverse hyperbolic functions.
TABLE A.2
Function Domain Range
y = arsinh(x) x ∈ (−∞, ∞) y ∈ (−∞, ∞)
y = arcosh(x) x ∈ [1, ∞) y ∈ [0, ∞)
y = artanh(x) x ∈ (−1, 1) y ∈ (−∞, ∞)
y = arcoth(x) x ∈ (−∞, −1) ∪ (1, ∞) y ∈ (−∞, 0) ∪ (0, ∞)
y = arsech(x) x ∈ (0, 1] y ∈ [0, ∞)
y = arcsch(x) x ∈ (−∞, 0) ∪ (0, ∞) y ∈ (−∞, 0) ∪ (0, ∞)
The inverse hyperbolic functions are expressible in terms of natural loga-

rithms. The formulas in Table A.3 hold for all x in the domains of the inverse
hyperbolic functions.
A.5 Logarithmic and Exponential Functions

If b > 0 and b 6= 1, then for positive values of x the logarithm to the
base b of x is denoted by logb x. If b = 10, logarithms are called common
logarithms. If b = e ≈ 2.718282 (≈ means approximately) then logarithms
are called natural logarithms.
f (x) = log10 x or f (x) = loge x is called the logarithm function. loge x
TABLE A.3
√ √
+ x2 +
arsinh(x) = ln (x 1) arcosh(x) = ln (x+ x2
− 1)
1 1+x 1 x+1
artanh(x) = ln arcoth(x) = ln
2 1−x 2 x−1
√ ! √ !
1+ 1 − x2 1 1 + x2
arsech(x) = ln arcsch(x) = ln +
x x |x|
is often denoted by ln x or ln |x|. y = ex if and only if (equivalent to) x = ln y

if y > 0 and x is any real number. f (x) = ex is called the exponential
function.
(a) ln(x.y) = ln x + ln y
(b) ln xy = ln x − ln y, y 6= 0
(c) ln(xr ) = r ln x
(d) ln x1 = − ln x

(e) ex+y = ex ey
(f) ln(ex ) = x for all real values of x
(g) elnx = x for x > 0
1

(h) ln 1 = 0, ln e = 1, ln e = −1 and ln(e2 ) = 2
A.6 Integration by Parts

Let u = f (x), du = f 0 (x)dx, v = G(x), dv = G0 (x)dx = g(x)dx. Then
Z Z
udv = uv − vdu.
If G(x) is an anti-derivative of a function g(x) and f (x) is a differentiable

function, then the formula for integration by parts can be written as
Z Z
f (x)g(x)dx = f (x)G(x) − f 0 (x)G(x)dx.
A.7 Integration Formulas

un+1
un du =
R
1. n+1 + c, n 6= 1
u−1 du = du
R R
2. u = ln u + C
eu du = eu + C
R
3.
ueu du = eu (u − 1) + C
R
4.
R
5. sin udu = − cos u + C
R
6. cos udu = sin u + C
R
7. tan udu = ln sec u + C
R
8. cot udu = ln sin udu + C
eau (a sin mu−m cos mu)
eau sin mudu =
R
9. m2 +a2 +C
eau (m sin mu+a cos mu)
eau cos mudu =
R
10. m2 +a2 +C
R
11. sec udu = ln(sec u + tan u) + C
R
12. csc udu = ln(csc u − cot u) + C
sec2 udu = tan u + C
R
13.
csc2 udu = − cot u + C
R
14.
R
15. sec u tan udu = sec u + C
R
16. csc u cot udu = − csc u + C
du
= arcsin ua + C
R
17. a2 −u2
du 1
arctan ua + C
R
18. a2 +u2 = a
√
√ du
R
19. a2 −u2
= ln(u + u2 ± a2 ) + C
du 1 a+u
R
20. a2 −u2 = 2a ln a−u +C
R
21. u sin udu = sin u − u cos u + C
R
22. u cos udu = cos u + u sin u + C
h i
ln u 1
udu = un+1 n+1
R
23. − (n+1) 2 +C
sin2 udu = 12 u − 1
R
24. 4 sin 2u + C
cos2 udu = 12 u + 1
R
25. 4 sin 2u + C
tan2 udu = tan u − u + C
R
26.
R R
27. udv = uv − vdu + C
A.8 Gamma Function

The gamma function is defined as
Z ∞
Γ(x) = tx−1 e−t dt. (A.27)
0
Convergence of the integral requires x − 1 > 1 or x > 0. Integrating (A.27) by

parts, we get the recurrence relation
Γ(x + 1) = xΓ(x).
R∞
Now, Γ(1) = 0 e−t dt = 1. Therefore, Γ(2) = 1Γ(1) = 1 and Γ(3) =
2Γ(2) = 2.1 and so on. √
We see that n is a positive integer, Γ(n + 1) = n! Γ 12 = π can be derived
from (A.27) by x = 1/2.
Remark A.1. Although the integral form (A.27) does not converge for x < 0,
it can be shown by means of alternative definitions that the gamma function
is defined for all real and complex numbers except x = −n, n = 0, 1, 2, . . . .
A.9 Definition of Multiple Integrals

In this section we present the definitions of double and triple integrals,
ZZ ZZZ
f (x, y) dA , respectively f (x, y, z) dV ,
Q Q
over rectangular regions Q. The exposition closely parallels that given in [6]
for the case of a single variable. We therefore shorten it somewhat and refer
the reader to [7] for more details.
Double integral
Let the function f be defined in a rectangular region Q = [a, b] × [c, d] of
the xy-plane. We partition Q into rectangular subregions as follows. Choose
a partition of [a, b] of the form a = x0 < x1 < · · · < xn = b, and a partition

of [c, d] of the form c = y0 < y1 < · · · < ym = d. This gives us a partition ∆
of Q into nm rectangles:
Qij = {(x, y) : xi−1 ≤ x ≤ xi , yj−1 ≤ y ≤ xj } , 1 ≤ i ≤ n, 1 ≤ j ≤ m.
A Riemannian sum for the partition ∆ is defined as
n X
X m
s∆ = f (ξi , ηj )(xi − xi−1 )(yj − yj−1 ) . (A.28)
i=1 j=1
Here, each point (ξi , ηj ) lies somewhere in the rectangle Qij . We define the
oscillation of f on the rectangles Qij as
Oij (f ) = oscQij (f ) = max |f (x) − f (z)| . (A.29)
x,z∈Qij
Again, we have to replace the maximum by the supremum if the former does
not exist. Next, we consider two different Riemannian sums for the same
partition ∆,
n X
X m
s∆ = f (ξi , ηj )(xi − xi−1 )(yj − yj−1 ) ,
i=1 j=1
n X m
(A.30)
X
s̃∆ = f (ξ˜i , η̃j )(xi − xi−1 )(yj − yj−1 ) .
i=1 j=1
Their difference can be estimated as

n X
X m
|s∆ − s̃∆ | = Oij (f )(xi − xi−1 )(yj − yj−1 ) =: V∆ (f ) . (A.31)
i=1 j=1
The number V∆ (f ) defined in (A.31) is called the oscillation sum of f for

the partition ∆.
Definition A.2. [Integrable Function] A bounded function f : Q → R is said
to be integrable on Q, if for every ε > 0 there exists a partition ∆ of Q such
that V∆ (f ) ≤ ε.
The latter condition means that, in view of (A.31), we can enforce the
difference between different Riemannian sums for the same partition ∆ to
become as small as we want, if we choose ∆ fine enough.
Let now f : Q → R be integrable. In order to define its integral, one goes
through the same three steps as we did in can case of single variable.
1. One proves that
|s∆ − s∆
˜ | < V∆ (f ) ,
whenever the partition ∆ ˜ is a refinement of the partition ∆ (that is, ∆

˜
is obtained from ∆ by adding partition points), for all Riemannian sums
s∆ and s∆ ˜ for those partitions.
2. One proves that

|s∆ − s∆
˜ | < V∆ (f ) + V∆
˜ (f ) ,
˜ and all Riemannian sums for those

for arbitrary partitions ∆ and ∆
partitions.
3. One proves that there exists a unique number I such that
|I − s∆ | ≤ V∆ (f ) (A.32)
holds for all partitions and all corresponding Riemannian sums.
We then define the integral of f on Q as

ZZ
f (x, y) dA = I ,
Q
where I is the number obtained in step 3 above.

Theorem A.2. Let f : Q → R be continuous. Then f is integrable on Q.
Triple integral. Let the function f be defined in a rectangular region Q =
[a1 , b1 ] × [a2 , b2 ] × [a3 , b3 ] of three-dimensional space. We partition each of
those intervals,
a1 = x0 < . . . xn = b1 , a2 = y0 < . . . ym = b2 , a3 = z0 < . . . zl = b3 .
The partition ∆ now consists of rectangular regions
Qijk = [xi , xi−1 ] × [yj , yj−1 ] × [zk , zk−1 ] .
The Riemannian sums have the form

n X
X m X
l
s∆ = f (ξi , ηj , ζk )(xi − xi−1 )(yj − yj−1 )(zk − zk−1 ) ,
i=1 j=1 k=1
where (ξ, ηj , ζk ) lies somewhere in Qijk . From this point onward, the definition
of ZZZ
f (x, y, z) dV
Q
proceeds in a completely analogous manner as in the case of the double inte-

gral.
Appendix B
Summary of Properties of Matrices
B.1 Properties of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749

B.2 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753
B.3 Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
B.4 Sobolev Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
B.1 Properties of Matrix

1. A matrix A is any rectangular array of numbers or functions:
 
a11 a12 ... a1n
 a21 a22 ... a2n 
A= .
 
.. .. .. ..
 . . . . 
am1 am2 ... amn
If a matrix A has m rows and n columns, we say A is an m × n matrix.
If m = n, A is called a square matrix of order n.
The element in the ith row and jth column is denoted by aij . We write
A = (aij )m×n . Each of the mn numbers is called an element of the
matrix.
2. Two m × n matrices A = (aij ) and B = (bij ) are equal if aij = bij for
each i and j.
3. A matrix having m rows and one column is called a column matrix.
 It 
is
a11
 a21 
also called a column vector or a vector. For example A =  . 
 
 .. 
am1
is a column vector. A matrix having a single row is called a row matrix.
4. A multiple of a matrix A by a number k is defined as kA = (kaij ).
5. The sum of two m × n matrices A and B is defined to be the matrix
A − B = (aij + bij )m×n , that is, we add the corresponding elements.
Differential of A and B can be defined as A − B = A + −B.
749
6. The determinant having the same elements as the square matrix A is

called the determinant of the matrix and is denoted by |A| or det A.
7. A square matrix is said to be singular if its determinant is zero,

otherwise it is non-singular.
8. A square matrix, all of whose elements except those in the leading diag-
onal (aii ), are zero is called the diagonal matrix. A diagonal matrix
where all the leading diagonal elements
 are
 equal is called
 a scalar ma-
3 0 0 2 0 0
trix. For example  0 1 0  and  0 2 0  are diagonal and
0 0 −1 0 0 2
scalar matrices respectively.
9. A diagonal matrix of order n, where all diagonal element are unity is
called
 a unit
 matrix of order n and is denoted by I. For example
1 0 0
 0 1 0 
0 0 1
is a unit matrix of order 3.
10. A square matrix A = [aij ] is said to be symmetric (skew symmetric)
if aij = aji (aij = −aji ) for all i and j. For a skew symmetric matrix,
leading diagonal elements are zero.
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
11. Let A =  . ..  be an m × n matrix, and B =
 
.. ..
 .. . . . 
am1 am2 ... amn
 
b11 b12 . . . b1n
 b21 b22 . . . b2n 
 be a n × p matrix. Then the product AB
 
 .. .. .. ..
 . . . . 
bm1 bm2 . . . bmn
 
c11 c12 ... c1n
 c21 c22 ... c2n 
is the m × p matrix AB =   where cij =
 
.. .. .. ..
 . . . . 
cm1 cm2 ... cmn
ai1 b1j + ai2 b2j + . . . + cin bnj .
The expression for ci j is known as the inner product of the ith row with
the jth column. In general matrix multiplication is not commutative
that is AB 6= BA.
Note : For the product AB to be defined the number of columns in A
should be same as number of rows in B.
12. In the product AB, the matrix A is said to be post-multiplied by the
Summary of Properties of Matrices 751
matrix B. In the product BA the matrix A is said to be pre-multiplied

by B.
13. Multiplication of matrices is associative, that is (AB)C = A(BC),

where A, B, C are m × n; n × p and p × r matrices respectively.
14. If all products are defined, multiplication of matrices is distributive over
addition that is
A(B+C) = AB+ACand(B+C)A = BA + CA.
15. If A is a square matrix, then the product AA is defined as A2 . Similarly

we define higher powers of A, i.e., AA2 = A3 .
16. A matrix consisting of all entries zero is called a zero matrix and is
denoted by O.
17. If A and O are m × n matrices then
A + O = O + A = A.
If A is a square matrix of the same order as I then IA = AI = A. I is

called the multiplicative identity of matrix A.
18. The transpose of the m × n matrix A is the n × m matrix A0 or AT
obtained from A by interchanging rows and columns.
In other words the rows of a matrix A become the columns of its trans-
pose A0 or AT . Observation: The transpose of the product of two
matrices is the product of their transposes taken in reverse order, that
is (AB)0 = B0 A0 .
19. If A is an n × n matrix, then an n × n matrix B if it exists such that
AB = BA = I where I is the multiplicative identity, is called the inverse
of A and is denoted by B = A−1 .
20. An n × n matrix A has inverse A−1 if and only if A is non-singular.
21. Formula for A−1 = det(A)
AdjA
, where AdjA (adjoint of A) is the transposed
matrix of cofactors of A.
22. If A(t) = (aij (t))m×n is a matrix whose entries are functions differen-
tiable on a common interval then the derivative of A is defined as

dA d
= aij .
dt dt m×n
The derivative of a matrix A(t) is also denoted by A’(t).

23. If A(t) = (aij (t))m×n is a matrix whose entries are continuous on a

common interval containing t and t0 then
Z t Z t
A(s)ds = aij (s)ds .
t0 t0 m×n
To differentiate (integrate) a matrix of functions we simply differentiate

(integrate) each entry.
24. Matrices are used in solving algebraic system of n linear equations in n
unknowns.
a11 x1 + a12 x2 + . . . + a1n xn = b1 .

a21 x1 + a22 x2 + . . . + a2n xn = b2 .
..
.
an1 x1 + an2 x2 + . . . + ann xn = bn .
If A denotes the matrix of coefficients of the above system, Cramer’s

rule can be used to solve the system whenever detA 6= 0. The above
system of equations can be written in the form of matrix equation
AX = B
     
a11 a12 . . . a1n x1 b1
 a21 a22 . . . a2n   x2   b2 
where A =  . , X =  .  , B =  . .
     
. . .
 .. .. .. ..   ..   .. 

am1 am2 . . . amn xn bn

 
a11 a12 . . . a1n b1
 a21 a22 . . . a2n b2 
The matrix  . ..  is called the augmented
 
.. .. ..
 .. . . . . 
am1 am2 . . . amn bn
matrix. Gauss elimination or Gauss-Jordan elimination can be used to
solve the system of equations.
25. Rank of a matrix is the largest order of any non-vanishing minor of the
matrix.
26. If A is any square matrix of order n and I is the nth order unit matrix,
the determinant of the matrix A − λI equated to zero is called the
characteristic equation of A, that is |A − λI| = 0 gives a characteristic
equation of A. The roots of the characteristic equation are called the
characteristic roots or eigenvalues of the matrix A.
27. Cayley-Hamilton Theorem Every square matrix satisfies its charac-
teristic equation.
Summary of Properties of Matrices 753
28. Let A be an n × n matrix. A number λ is called an eigenvalue of A if

there exists a non-zero solution vector X of the linear system
AX = λX.
The column vector X is said to be an eigenvector corresponding to the

eigenvalue λ.
Observation: Corresponding to n distinct eigenvalues, we get n linearly
independent eigenvalues. But when two or more eigenvalues are equal,
it may or may not be possible to get linearly independent eigenvectors
corresponding to the repeated roots.
B.2 Metric Space

Definition 128. Let X be a non-empty set and d a function defined on
X × X into the set of real numbers R, d : X × X → R, satisfying the following
conditions:
(i) d(x, y) ≥ 0 ∀x, y ∈ X.
(ii) d(x, y) = 0 if and only if x = y.
(iii) d(x, y) = d(y, x) ∀x, y ∈ X.
(iv) d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ X.
The number d(x, y) is called the distance between x and y, d is called the
metric and the pair (X, d) is called a metric space.
Example B.1. The set of all real numbers with distance
d(x, y) = |x − y|
is the metric space R1 .

Example B.2. The set of all ordered n−tuples
x = (x1 , x2 , ....., xn )
of real numbers with distance

v
u n
uX
d(x, y) = t (xi − yi )2
i=1
is the metric space Rn .

B.3 Hilbert Space

Definition 129.
( )
Z b
2
L2 (a, b) = Real functions on [a, b]/ |f (x)| dx < ∞
a
are examples of Hilbert space (complete inner product space)
B.4 Sobolev Space

Definition 130.
H 0 (0, 1) = {f ∈ L2 (0, 1)/f 0 ∈ L2 (0, 1)}
is called Sobolev space of order 1.
H00 (0, 1) = {f ∈ H 0 (0, 1)/f (0) = f (1) = 0} .

Appendix C
Proof of Selected Theorems
C.1 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755

C.2 Green-Ostrogradski Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
C.3 The Divergence Theorem of Gauss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
C.4 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
C.5 Conservative Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
C.6 Proofs of Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
C.1 Fundamental Theorem of Calculus

In this section we present the precise definition of the integrals and the proofs
of the fundamental theorem of calculus and related results.
Definition of integral
The integral
Z b
f (x) dx
a
of the function f (which we assume to be bounded) over the interval [a, b] is

defined through an approximation procedure which involves Riemannian sums
n
X
s∆ = f (ξk )(xk − xk−1 ) . (C.1)
k=1
Here, ∆ is a partition of the interval [a, b] of the form a = x0 < x1 < · · · < xn ,
and each point ξk lies somewhere in the subinterval [xk−1 , xk ]. Thus, the value
of s∆ depends on the choice of ∆ as well as on the choice of the points ξk .
One can estimate the influence of the latter choice through the notion of
oscillation of a bounded function. If I is a subset of the domain of f (here, of
the interval [a, b]), the oscillation of f on I is defined as the maximum possible
difference of two function values on I, that is,
oscI (f ) = max |f (x) − f (z)| . (C.2)

x,z∈I
We remark that it may happen that this maximum does not exist (that is,
there are no points x, z ∈ I where a maximum value is attained). One then
replaces the maximum by the so-called supremum, which in this case is equal
755
to the smallest number η such that |f (x) − f (z)| ≤ η for all x, z ∈ I.

Let us now consider two different Riemannian sums for the same partition ∆,
n
X n
X
s∆ = f (ξk )(xk − xk−1 ) , s̃∆ = f (ξ˜k )(xk − xk−1 ) . (C.3)
k=1 k=1
We estimate their difference as

Xn n
X
|s∆ − s̃∆ | = f (ξk )(xk − xk−1 ) − f (ξ˜k )(xk − xk−1 )

k=1 k=1
n
X
≤ |f (ξk ) − f (ξ˜k )|(xk − xk−1 )
k=1
Xn
≤ Ok (f )(xk − xk−1 ) =: V∆ (f ) , (C.4)
k=1
where Ok (f ) denotes the oscillation of f on the subinterval I = [xk−1 , xk ].

The number V∆ (f ) defined in (C.4) is called the oscillation sum of f for the
partition ∆.
Definition C.1. [Integrable function] A bounded function f : [a, b] → R is
said to be integrable on [a, b], if for every ε > 0 there exists a partition ∆
of [a, b] such that V∆ (f ) ≤ ε.
The latter condition means that, in view of (C.4), we can enforce the
difference between different Riemannian sums for the same partition ∆ to
become as small as we want, if we choose ∆ fine enough.
Let f : [a, b] → R be integrable. In order to define its integral, one goes
through the following three steps.
1. One proves that
|s∆ − s∆
˜ | < V∆ (f ) ,
whenever the partition ∆ ˜ is a refinement of the partition ∆ (that is, ∆

˜
is obtained from ∆ by adding partition points), for all Riemannian sums
s∆ and s∆ ˜ for those partitions.
2. One proves that

|s∆ − s∆
˜ | < V∆ (f ) + V∆
˜ (f ) ,
˜ and all Riemannian sums for those

for arbitrary partitions ∆ and ∆
partitions.
3. One proves that there exists a unique number I such that
|I − s∆ | ≤ V∆ (f ) (C.5)
holds for all partitions and all corresponding Riemannian sums.

Proof of Selected Theorems 757
We then define the integral of f on [a, b] as

Z b
f (x) dx = I ,
a
where I is the number obtained in step 3 above.

First we prove that every continuous function on a closed interval [a, b]
is integrable. Its proof uses the notion of uniform continuity. A function f :
[a, b] → R is uniformly continuous on [a, b], if for every > 0 there exists
δ > 0 such that |f (x)−f (z)| < ε holds for all x, z ∈ [a, b] with |x−z| < δ. This
is similar to, but not the same as the definition of continuity. It is a theorem
(not treated in this book) that every continuous function defined on a closed
and bounded interval [a, b] is uniformly continuous on that interval.
Let f : [a, b] → R be continuous. According to the definition of integrable
function for an arbitrarily given ε > 0 we have to find a partition ∆ such that
V∆ (f ) < ε. Choose δ > 0 small enough such that |f (x) − f (z)| < ε/(b − a)
whenever |x − z| < δ. (This is possible since, by what we said above, f is also
uniformly continuous.) Next, choose a natural number n large enough such
that (b − a)/n < δ. Take as partition ∆ the equidistant partition of [a, b] with
xk − xk−1 = (b − a)/n. Then the oscillation Ok (f ) of f on [xk−1 , xk ] satisfies
Ok (f ) ≤ ε/(b − a) for every k and
n
X ε b−a
V∆ (f ) = Ok (f )(xk − xk−1 ) ≤ n · = ε.
b−a n
k=1
Therefore f is integrable.
Let us remark that there are different ways of defining the integral. We
have chosen a method which can be generalized conveniently to the cases of
double and triple integrals.
Fundamental theorem
Let f be a continuous function on [a, b] then
Z x
F (x) = f (t)dt
a
d Rx
has derivative at every point in [a, b] and a
f (t)dt = f (x).
Rb dx
Also a f (x)dx = F (b) − F (a).
Proof. Since f is continuous, it attains its maximum and minimum on [a, b].
Let
M = max f (x) , m = min f (x) .
x∈[a,b] x∈[a,b]
Since m ≤ f (x) ≤ M for all x ∈ [a, b], due to continuity, we have

Z b Z b Z b
m(b − a) = m dx ≤ f (x) dx ≤ M dx = M (b − a) .
a a a
We divide by b − a and obtain

Z b
1
m≤ f (x) dx ≤ M .
b−a a
By the intermediate value theorem, f attains every value between m and M .

Therefore there exists c ∈ [a, b] such that
Z b
1
f (c) = f (x) dx .
b−a a
To establish the first part of this theorem, we must show that if x is an

arbitrary point in [a, b], then F 0 (x) = f (x), that is,
F (x + h) − F (x)
lim = f (x) .
h→0 h
In order to prove this, fix x ∈ [a, b] and let h be any number such that
x + h ∈ [a, b]. Using the definition of F together with property of indefinite
integral yields
Z x+h Z x
F (x + h) − F (x) = f (t) dt − f (t) dt
a a
Z x Z x+h Z x
= f (t) dt + f (t) dt − f (t) dt
a x a
Z x+h
= f (t) dt .
x
Consequently, if h 6= 0, then
x+h
F (x + h) − F (x)
Z
1
= f (t) dt .
h h x
In the case h > 0, by the mean value theorem, we can find a number z = z(h)
in the open interval (x, x + h) such that
Z x+h
f (t) dt = f (z(h)) · (x + h − x) = f (z(h)) · h
x
and, therefore,
F (x + h) − F (x)
= f (z(h)) . (C.6)
h
Since x < z < x + h, we have limh→0+ z(h) = x. It then follows from the
continuity of f that
lim+ f (z(h)) = f (x) ,
h→0
and we conclude from (C.6) that

F (x + h) − F (x)
lim+ = f (x) .
h→0 h
If h < 0, we may prove in a similar way that
F (x + h) − F (x)
lim = f (x) .
h→0− h
The two preceding one-sided limits imply that
F (x + h) − F (x)
F 0 (x) = lim = f (x) .
h→0 h
This completes the proof of first part.
Let F be any antiderivative of f and let
Z x
G(x) = f (t) dt . (C.7)
a
By well known results of Calculus we can show that

G(x) = F (x) + C
for every x in [a, b]. Together with (C.7) this implies that
Z x
f (t) dt = F (x) + C
a
Ra
for every x in [a, b]. If we let x = a and use the fact that a f (t) dt = 0, we
obtain 0 = F (a) + C, or C = −F (a). If we let x = b, we arrive at
Z b
f (t) dt = F (b) + C = F (b) − F (a) .
a
(Recall that whether the integration variable is denoted by x or by t is irrel-

evant.)
Integral test P∞
For continuous positive non-increasing functions
R ∞ f the series n=1 f (n) con-
verges if and only if the improper integral 1 f (x) dx converges.
R∞
Proof. First, assume that 1 f (x) dx converges. We define a function ϕ :
[1, ∞) → R by setting ϕ(x) = f (n) if n − 1 ≤ x < n. Since f is non-increasing,
we have ϕ(x) ≤ f (x) for all x ≥ 1. For every natural number N ≥ 2 we
therefore obtain, using property of integral,
N
X N Z
X n Z N Z N
0≤ f (n) = ϕ(x) dx = ϕ(x) dx ≤ f (x) dx
n=2 n=2 n−1 1 1
Z ∞
≤ f (x) dx < ∞ .
1
P∞ R∞
Thus, the partial sums of the series n=1 f (n) are bounded by P f (x) dx,
1 ∞
and hence the series converges. To prove the converse, assume that n=1 f (n)
converges. We define a function ψ : [1, ∞) → R by setting ψ(x) = f (n − 1)
if n − 1 ≤ x < n. Since f is non-increasing, we have ψ(x) ≥ f (x) for all
x ≥ 1. For every natural number N ≥ 2 we therefore obtain, using property
of integral,
Z N Z N N Z
X n N
X
0≤ f (x) dx ≤ ψ(x) dx = ψ(x) dx = f (n − 1)
1 1 n=2 n−1 n=2
∞
(C.8)
X
≤ f (n) < ∞ .
n=1
The function F : [1, ∞) → R defined by

Z t
F (t) = f (x) dx
1
P∞
is non-decreasing and, due to (C.8), bounded by the finite number n=1 f (n).
Therefore, the improper limit limt→∞ F (t) exists which means that the im-
proper integral converges.
C.2 Green-Ostrogradski Theorem

Theorem C.1. Let D be a bounded domain in the plane whose boundary C
is a closed, simple and positively oriented curve. Let F = (f, g) be a vector
field whose components are continuously differentiable. Then
I ZZ
∂g ∂f
f (x, y) dx + g(x, y) dy = − dA . (C.9)
C D ∂x ∂y
Proof. In this proof we restrict ourselves to the case where the region D
has the form indicated in Figure C.1. First, we consider the case g = 0. The
boundary C of D consists of a lower part C1 which is the graph of some
function y = h(x), and of an upper part C2 which is the graph of some
function y = k(x); again we refer to Figure C.1. The line integral becomes
I Z Z
F · dr = F · dr + F · dr ,
C C1 C2
where, due to the counterclockwise orientation, C1 is traversed from left to

right, and C2 from right to left.
Z Z b
F · dr = f (x, h(x)) dx ,
C1 a
and analogously we obtain (note the reversal of the integration limits)

Z Z a
F · dr = f (x, k(x)) dx .
C2 b
Taken together the formulas above yield

I Z b
F · dr = f (x, h(x)) − f (x, k(x)) dx . (C.10)
C a
Now we consider the double integral. The region D has the form
D = {(x, y) : a ≤ x ≤ b , h(x) ≤ y ≤ k(x)} .
We compute the double integral on the right hand side of (C.9) for g = 0,
using the fundamental theorem of calculus,
ZZ Z b Z k(x)
∂f ∂f
− (x, y) dA = − (x, y) dy dx
D ∂y a h(x) ∂y
Z b
=− f (x, k(x)) − f (x, h(x)) dx .
a
From (C.10) we see that in the case g = 0

I I ZZ
∂f
F · dr = f (x, y) dx = − (x, y) dA . (C.11)
C C D ∂y
An analogous proof for the case f = 0, taking F̃ = (0, g), shows that
I I ZZ
∂g
F̃ · dr = g(x, y) dy = (x, y) dA . (C.12)
C C D ∂x
(In that proof, we decompose the boundary C of D into a left part and a right
part, described by some functions x = h̃(y) and x = k̃(y), respectively.) For
the general case, we decompose (f, g) = (f, 0) + (0, g), and add (C.11) and
(C.12). The theorem is proved for domains D of the form in Figure C.1.
For more general domains D, the two-dimensional variant of the divergence

theorem of Gauss can be used conveniently to prove Theorem C.1.
C.3 The Divergence Theorem of Gauss

The Gauss divergence theorem, Theorem 42 asserts that
ZZ ZZZ
F · n dσ = div F dV . (C.13)
Σ D
y =k(x)
y = h(x)
x=a x=b
FIGURE C.1: Domain for Green-Ostrogradski Theorem
D is a domain in R3 with boundary Σ and outer unit normal field n. We

present a proof of Theorem 42 for the special case where D and Σ can be rep-
resented, with respect to the coordinate directions x, y and z, in the following
way. With respect to the z-coordinate, we assume that D has the form
D = {(x, y, z) : k(x, y) < z < h(x, y) , (x, y) ∈ Dz } , (C.14)
where Dz = pz (D) with pz (x, y, z) = (x, y, 0) being the projection of D onto
the xy-plane, and k and h are certain functions. Moreover, we assume that the
boundary surface Σ of D consists of an upper part Σ+ described as z = h(x, y),
and a lower part Σ− described as z = k(x, y), and possibly a part Σ0 which is
vertical to the xy-plane (that is, the normals to Σ0 are perpendicular to the
z-direction).
Before proceeding further, let us illustrate this situation with two exam-
ples. If D is a ball bounded by a sphere Σ, the surfaces Σ+ and Σ− are the
upper and lower half spheres, respectively, while Σ0 is empty. If D is a rect-
angular box parallel to the coordinate axes, Σ+ and Σ− are the rectangles
forming the top and the bottom, respectively, while Σ0 consists of the four
vertical sides.
We assume moreover that D admits analogous representations with respect
to the x- and y-coordinates.
Proof. Recall that for F = f1 i + f2 j + f3 k, its divergence is given by
∂f1 ∂f2 ∂f3
div F = + + .
∂x ∂y ∂z
Due to the form of D described above, we can decompose a volume integral

over D into an outer integral over the two-dimensional projection Dz and an
integral over intervals in the z direction. We compute (using Fubini’s theorem
and the fundamental theorem of calculus):
ZZZ Z Z "Z h(x,y) #
∂f3 ∂f3
dV = (x, y, z) dz dx dy
D ∂z Dz k(x,y) ∂z
ZZ h i
= f3 (x, y, h(x, y)) − f3 (x, y, k(x, y)) dx dy (C.15)
Z ZDz ZZ
= f3 (x, y, h(x, y)) dx dy − f3 (x, y, k(x, y)) dx dy .
Dz Dz
The surface integral of F · n = f1 n1 + f2 n2 + f3 n3 decomposes into

ZZ ZZ ZZ ZZ
F · n dσ = f1 n1 dσ + f2 n2 dσ + f3 n3 dσ . (C.16)
Σ Σ Σ Σ
For the third term on the right hand side we consider the partition of Σ as
described above,
ZZ ZZ ZZ ZZ
f3 n3 dσ = f3 n3 dσ + f3 n3 dσ + f3 n3 dσ . (C.17)
Σ Σ+ Σ− Σ0
On Σ0 , the unit outer normal n is perpendicular to the z-direction, so n3 = 0

on Σ0 and the corresponding integral vanishes. We consider Σ+ . Since Σ+
is given as z = h(x, y) and D lies below Σ+ the unit outer normal n points
upward. In the point (x, y, z) with z = h(x, y) it is given by
1 q
n(x, y, z) = − ∂x hi − ∂y hj + k , ν = 1 + (∂x h)2 + (∂y h)2 , (C.18)
ν
where the partial derivatives ∂x h and ∂y h are evaluated at (x, y). We thus
obtain
1
n3 = (C.19)
ν
for its third component. We transform the surface integral over Σ+ into a
two-dimensional integral over the region Dz , according to Definition C.1.
ZZ ZZ ZZ
f3 f3
q
f3 n3 dσ = dσ = · 1 + (∂x h)2 + (∂y h)2 dA
Σ+ Σ ν Dz ν
ZZ +
= f3 (x, y, h(x, y)) dx dy . (C.20)
Dz
The surface integral over Σ− is treated analogously; there, however, the outer
normal vector points downward, so we have n3 = −1/ν instead of (C.19).
Consequently,
ZZ ZZ
f3 n3 dσ = − f3 (x, y, k(x, y)) dx dy . (C.21)
Σ− Dz
Putting together (C.15), (C.17), (C.20) and (C.21) we arrive at

ZZ ZZZ
∂f3
f3 n3 dσ = dV . (C.22)
Σ D ∂z
Working with the representations of D with respect to the x- and y-

coordinates, one obtains in an analogous manner that
ZZ ZZZ ZZ ZZZ
∂f1 ∂f2
f1 n1 dσ = dV , f2 n2 dσ = dV . (C.23)
Σ D ∂x Σ D ∂y
Adding the three equations in (C.22) and (C.23) yields (C.13). This completes
the proof for the special form of D as considered.
To prove the divergence theorem for domains D of general form, one em-
ploys so-called partitions of unity of the vector field F, which reduces
the general situation to a situation where similar computations can be done
as in the proof presented above. In addition, let us remark that while we
have treated the situation in three-dimensional space, more or less the same
proof works for the case of an n-dimensional region D bounded by an n − 1-
dimensional surface Σ, where n is an arbitrary number greater than or equal
to 2. Both these developments are, however, outside the scope of this book.
C.4 Stokes Theorem

This section is devoted to the proof of the theorem of Stokes, It states that
under suitable assumptions.
I ZZ
F · dr = (curl F) · n dσ , (C.24)
C Σ
where F is a vector field, Σ is a surface with unit normal field n and boundary
curve C, suitably oriented.
Proof of Theorem 45.The strategy of the proof is to transform the situation
to the xy-plane, in order to apply the Green-Ostrogradski theorem. Let us first
consider the surface integral on the right hand side of (C.24). The surface Σ is
described as z = S(x, y), that is, as the graph of a function S defined on D, the
projection of Σ onto the xy-plane. The unit normal at a point (x, y, S(x, y))
of Σ is given by
1 q
n= − ∂x Si − ∂y Sj + k , ν = 1 + (∂x S)2 + (∂y S)2 , (C.25)
ν
where the right hand side is evaluated at (x, y). Since

i j k

curl F = ∇ × F = ∂x ∂y ∂z (C.26)
f1 f2 f3
= (∂y f3 − ∂z f2 )i + (∂z f1 − ∂x f3 )j + (∂x f2 − ∂y f1 )k , (C.27)
the integrand of the surface integral becomes the scalar function

1
(curl F) · n = (∂y f3 − ∂z f2 ) · (−∂x S) + (∂z f1 − ∂x f3 ) · (−∂y S)
ν (C.28)
+ (∂x f2 − ∂y f1 ) ,
to be evaluated at points of Σ. Using Definition 75 we can transform the

surface integral into a double integral over the domain D,
ZZ ZZ
(curl F) · n dσ = (curl F) · n · ν dA
Σ D
ZZ
= (∂y f3 − ∂z f2 ) · (−∂x S) + (∂z f1 − ∂x f3 ) · (−∂y S) (C.29)
D

+ (∂x f2 − ∂y f1 ) dA.
Let us now consider the line integral on the left side of (C.24). If C is pa-
rameterized by r : [a, b] → R3 , then by the definition of the line integral we
have I Z b
F · dr = F(r(t)) · r0 (t) dt . (C.30)
C a
Since Σ is the graph of S defined on D, C is the graph of S restricted to
the boundary Γ of D. Let Γ be positively oriented by the parametrization
q : [a, b] → R2 , set
r(t) = q1 (t)i + q2 (t)j + S(q1 (t), q2 (t))k . (C.31)
Using the chain rule we obtain

r0 (t) = q10 (t)i + q20 (t)j + ∂x S · q10 (t) + ∂y S · q20 (t) k , (C.32)
where ∂x S and ∂y S are evaluated at q(t) = (q1 (t), q2 (t)). Let us now define
the plane vector field F̃ by
F̃1 (x, y) = f1 (x, y, S(x, y)) + f3 (x, y, S(x, y))∂x S(x, y)

(C.33)
F̃2 (x, y) = f2 (x, y, S(x, y)) + f3 (x, y, S(x, y))∂y S(x, y) .
Setting x = q1 (t) and y = q2 (t) we see in view of (C.31) and (C.32) that
F(r(t)) · r0 (t) = F̃(q(t)) · q0 (t) (C.34)

for all t ∈ [a, b]. (This is the reason for defining F̃ by (C.33.))
The theorem of Green and Ostrogradski, Theorem C.1,asserts that
I ZZ
F̃ · dr = ∂x F̃2 − ∂y F̃1 dA . (C.35)
Γ D
We compute the partial derivatives from (C.33) with the aid of the chain rule,
∂y F̃1 = ∂y f1 + ∂z f1 · ∂y S + (∂y f3 + ∂z f3 · ∂y S)∂x S

+ f3 · ∂y ∂x S ,
(C.36)
∂x F̃2 = ∂x f2 + ∂z f2 · ∂x S + (∂x f3 + ∂z f3 · ∂x S)∂y S
+ f3 · ∂x ∂y S .
Since the function S was assumed to have continuous second partial deriva-
tives, we can interchange their order, so we have ∂x ∂y S = ∂y ∂x S. Thus we
obtain from (C.36) that
∂x F̃2 −∂y F̃1 = (∂z f2 −∂y f3 )·∂x S +(∂x f3 −∂z f1 )·∂y S +(∂x f2 −∂y f1 ) . (C.37)
We compare this expression with the corresponding one in (C.29) and find
that ZZ ZZ
∂x F̃2 − ∂y F̃1 dA = (curl F) · n dσ . (C.38)
D Σ
We now put together the previous calculations and finally conclude that
I Z b Z b
0
F · dr = F(r(t)) · r (t) dt = F̃(q(t)) · q0 (t) dt
C a a
I ZZ
= F̃ · dr = ∂x F̃2 − ∂y F̃1 dA
ZΓZ D
= (curl F) · n dσ .
Σ
The proof is complete.
C.5 Conservative Fields

In this section, we prove the two theorems 36 and 37 concerning conserva-
tive vector fields.
Proof of Theorem 36.
Let r : [a, b] → R3 be a parametrization of C, so A = r(a) and B = r(b). We
set g(t) = ψ(r(t)). From the chain rule we get
g 0 (t) = ∇ψ(r(t)) · r0 (t) .

We compute
Z Z Z b
F · dr = F(r(t)) · r0 (t) dt = ∇ψ(r(t)) · r0 (t) dt
C C a
Z b
= g 0 (t) dt = g(b) − g(a) = ψ(r(b)) − ψ(r(a))
a
= ψ(B) − ψ(A) .
In the middle line of this computation we have used the fundamental theo-
rem of calculus. Indeed, one may view Theorem 36 as a generalization of the
fundamental theorem of calculus to line integrals.
Proof of Theorem 37.
Since we already know that every conservative vector field is circulation free,
it remains to prove that every vector field, which is circulation free, is con-
servative. Let F be circulation free. We first show that F has the property of
path independence. If C1 and C2 are two curves with initial point A and end
point B, let C denote the curve which first connects A to B via C1 and then
B to A via C2 in the opposite direction (the latter curve we denote by −C2 ).
Since F is circulation free,
I Z Z Z Z
0= F · dr = F · dr + F · dr = F · dr − F · dr ,
C C1 −C2 C1 C2
thus Z Z
F · dr = F · dr .
C1 C2
Therefore, the line integral is path independent. We now fix a point P in D

and define a function ψ by
Z
ψ(x) = F · dr , x ∈ D , (C.39)
Cx
where Cx is a curve which connects P to x. (Since the line integral is path

independent, it does not matter which curve we choose.) We claim that ψ is
a potential for F in D. To this end, fix x ∈ D and choose h > 0 so small that
the line segment L from x to x + he1 lies in D, where e1 denotes the unit
vector in the x-direction. Since we obtain a curve from P to x + he1 by first
traversing Cx and then L, we see from the definition of ψ that
Z
ψ(x + he1 ) = ψ(x) + F · dr . (C.40)
L
We parametrize L by r(t) = x + te1 with 0 ≤ t ≤ h, then r0 (t) = e1 and

Z Z h Z h
F · dr = F(r(t)) · e1 dt = f1 (r(t)) dt .
L 0 0
Therefore
h
ψ(x + he1 ) − ψ(x)
Z
1
= f1 (r(t)) dt .
h h 0
Since the limit of the right hand side exists as h → 0 and is equal to f1 (r(0)) =
f1 (x), we obtain
∂ψ
(x) = f1 (x) , x ∈ D .
∂x
An analogous argument works for the other coordinate directions, so we finally
conclude that ∇ψ = F. The proof is complete.
C.6 Proofs of Properties of Determinants

Proof of Theorem 6: It is a consequence of the result that detA = −detB,
where B is obtained from A by interchanging two rows or columns. We prove
it for 2 ×2 matrices.
a11 a12 b11 b12
Let A = and B = . Then
a21 a22 b21 b22
detA = a11 a22 − a12 a21 ,
detB = a21 a12 − a11 a22 ,

that is, detA = −detB.
It two rows or columns are same then detA = −detA implying detA = 0.
Proof of Theorem 7: It follows immediately by definition. In fact, detA = 0
if A has a zero row.
Proof of Theorem 9: Suppose the entries in the ith row of A are multiplied
by the number k. Call the resulting matrix B. We obtain
detB = kai1 Ci1 + kai2 Ci2 + · · · + kain Cin .
By expanding detB by cofactors along ith row or
detB = k(ai1 Ci1 + ai2 Ci2 + · · · + ain Cin )

= kdetA.
Proof of Theorem 11: We prove it for a 3 × 3 lower triangular matrix

 
a11 0 0
A =  a21 a22 0 .
a31 a32 a33

a22 0
detA = a11
= a11 (a22 a33 − 0.a32 )
a32 a33
= a11 a22 a33 .
Proof of Theorem 71 (three point theorem):

Let w = T (z) be the solution for w in terms of z and the six given points in
the equation
(w1 − w)(w3 − w2 )(z1 − z2 )(z3 − z) = (z1 − z)(z3 − z2 )(w1 − w2 )(w3 − w)

(w − w1 ) (w2 − w3 ) (z − z1 ) (z2 − z3 )
or = .
(w − w3 ) (w2 − w1 ) (z − z3 ) (z2 − z1 )
Appendix D
Basic Concepts in Medical Imaging
and Oil Exploration
D.1 Fundamental Steps in Digital Image Processing . . . . . . . . . . . . . . . . . 771

D.2 Introduction to Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
D.3 Core Data and Well Loggings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
D.1 Fundamental Steps in Digital Image Processing

Image acquisition: The types of images in which we are interested are gener-
ated by the combination of an illumination source and reflection or absorption
of energy from that source. The illumination may originate from a source of
electromagnetic energy such as radar, infrared or X-ray energy. Sensors are
used to transform illumination energy into digital images. In a simple way in-
coming energy is transformed into a voltage by combination of input electrical
power and sensor material that is responsive to the particular type of energy
being detected. The output voltage waveform is the response of the sensor,
and a digital quantity is obtained from each sensor by digitizing its response.
Image enhancement reveals the details that are obscured or highlights
certain features of interest in an image.
Compression deals with techniques for reducing the storage required to
save an image. Image compression is familiar to most users of computers in
the form of image file extensions, such as jpg extensions.
Morphological processing deals with the tools for extracting image com-
ponents that are useful in the representation and description of shapes.
Segmentation partitions an image into its constituent parts.
Representation and description follow the output of a segmentation stage
which is usually raw pixel data constituting the boundary of a region or all
points in the region itself.
771
D.2 Introduction to Medical Imaging

Gamma-ray imaging: Major uses of imaging based on gamma rays include
nuclear medicine and astronomical observations. In nuclear medicine, the ap-
proach is to inject a patient with a radioactive isotope that emits gamma rays
as it decays. Images are produced from the emissions collected by gamma
ray detectors. The principle is the same as with X-ray tomography. However,
instead of using an external source of X-ray energy, the patient is given a
radioactive isotope that emits protons as it decays. When a proton meets an
electron both are annhihiated; the image is created using the basic principles
of tomography.
A star in the constellation of Cygnus exploded about 15,000 years ago,
generating a superheated stationary gas cloud (known as the Cygnus loop)
that glows in a spectacular array of colors.
X-ray imaging: X-rays are among the oldest sources of electromagnetic
(EM) radiation used for imaging. The best known use of X-ray is medical
diagnostics, but X-rays are also used extensively in industry and other areas,
like astronomy. X-rays for medical and industrial imaging are generated us-
ing X-ray vacuum tubes with a cathode and anode. The cathode is heated,
causing free electrons to be released. These electrons flow at high speed to
the positively charged anode. When the electrons strike a nucleus, energy is
released in the form of X-ray radiation. The energy (penetrating power) of the
X-rays is controlled by a voltage applied across the anodes,and the number
of X-rays is controlled by a current applied to the filament in the cathode.
Imaging in ultraviolet band: Applications of ultraviolet light are varied.
They include lithography, industrial inspection, microscopy, lasers, biological
imaging and astronomical observation. We illustrate imaging in this band with
examples from microscopy and astronomy. Ultraviolet light is used in fluores-
cence microscopy, one of the fastest growing areas of microscopy. Fluorescence
is a phenomenon discovered in the middle of the 19th century with the ob-
servation that the fluorspar minerals fluoresces when ultraviolet radiation is
directed on it. The ultraviolet light is not visible until a photon of ultraviolet
radiation collides with an electron in an atom of a fluorescent material. The
photon elevates the electron to a higher energy level. The excited electron
relaxes to a lower level and emits light as a lower energy photon in the visible
(red) light region. A fluorescence microscope uses excitation light to irradiate
a prepared specimen and then separates the much weaker radiation fluores-
cent light from brighter excitation light.
Imaging in visible and infrared bands: Imaging in these bands is useful
in various disciplines because of its wide scope and number of applications.
The infrared band is used in conjunction with visual imaging in areas such
as light microscopy, astronomy, remote sensing, industrial operations, law en-
Basic Concepts in Medical Imaging and Oil Exploration 773
forcement, and other fields that utilizes images to generate enhancement and
measurements.
EEG: An electroencephalogram (EEG) measures and records the electrical
activity of the brain. Special sensors (electrodes) are attached to head and
hooked by wires to a computer. The computer records brain electrical activity
on a screen or on paper as wavy lines. Certain conditions, such as seizures,
can be seen by the changes in the normal pattern of brain electrical activity.
ECG: The electrocardiogram (ECG) records the electrical activity of the
heart. Each heart beat is displayed as a series of electrical waves characterized
by peaks and valleys. An ECG reveals (1) the duration of the electrical waves
crossing the heart which determines whether the electrical activity is normal
or slow or irregular and (2) reveals is the amount of electrical activity passing
through the heart muscle which enables to find whether the parts of the heart
are too large or overworked. Normally, the frequency level of an ECG signal
is 0.05100 Hz and its dynamic level is 110 mV.
MRI: Magnetic resonance imaging (MRI) uses a magnetic field and pulses of
radio wave energy to depicts organs and structures inside the body. In many
cases MRI gives different information about the body that can be seen with
an X-ray, ultrasound, or computed tomography (CT) scan. For more details
we refer to [8] and [9].
D.3 Core Data and Well Loggings

A petroleum reservoir is composed of hydrocarbons in porous rock forma-
tions. The crude oil found in the petroleum reservoir forms from the remains
of organism. As we know a hydrocarbon is a chemical compound consisting
of elements carbon (C) and hydrogen (H).
Hydrocarbons contain backbones of carbon atoms, known as carbon skele-
ton with hydrogen atoms attached to the backbones. Hydrocarbons, which are
combustible, are the main components of fossil fuels, that include petroleum,
coal and natural gas. Petroleum is nothing but crude oil.
An oil field is a region with an abundance of oil wells extracting crude
Oil (petroleum) from below ground. There are more than 40,000 oil fields
scattered around the globe. The largest are the Ghawar fields in Saudi Arabia
and Burgan oil field in Kuwait. Famous companies engaged in the service of
creating infrastructure and providing specialized services to operate a field
profitably are Schlumberger, Esso, Bechtel, Baker-Hughes, Weatherford and
Halliburton.
Core data is obtained using special hollow drill bits. Core data from rock
provides information for indirect measurements of seismic and well logs. Sam-
ples can be taken from the cores and measurements are made in the laboratory.
One of the most important uses of core data is to help geologists identify
depositional environment as a function of distance along the wells. Well logs
record parameters such as spontaneous potential (SP), resistivity logs, dielec-
tric logs, passive gamma-ray logs, active gamma-ray logs, neutron logs, NMR
logs, sonic logs and dip meter logs. Thus, a well log is a set of time series data.
These logs are very well presented in Iske and Randen [1]. For detailed dis-
cussion one may refer to Luthi [2] and Selly [3]. An intensive review of reservoir
modelling is also given in [2] while [3] deals with the updated petroleum geol-
ogy.
The procedure of well logging involves sending a package of instruments
known as a sonde-suspended from a cable or wire, into the formation. Well
logging is used to determine the type and properties of any field in the rocks
and to determine the geometric and physical properties of the rock. A major
aim of well logging is to tie horizons observed in wells to horizons. Well logging
helps to locate and quantify potential depths zones containing hydrocarbons.
This appendix discusses applications of wavelet methods to well logs for de-
termination and interpretation of cyclicity, zonation and abrupt changes in
sedimentary successions. Stratigraphy is a branch of geology studying rock
layers and layering. Sequence stratigraphy is the study of cyclic sedimentary
deposits. Gamma rays, porosity and bulk density are the most important
properties in oil exploration.
Appendix E
Solution of Odd Number Exercises
E.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
E.1 Exercises
Exercise Set - Chapter 1
1.1 (i) (2, 4, 12)

(ii) (3, −19, −2)
√
(vi) 22 14
1.3 (i) 12 (ii) 72
1.5 (i) α = 9
√
1.7 ||a|| = 3 2, cos θ = √1
5
1.9 ||αa + βb|| =< αa + βb, αa + βb >
= α2 < a, a > +β 2 < b, b > +2αβab = α2 kak2 + β 2 kbk2 + 2αβa.b
1.11 a.(a × b) = 0 and b.(a × b) = 0

1.13 Linearly independent.
1.17 None of these matrices are equal.
1.19 c23 = 9, c12 = 0.

2 4
1.21 (i) (AT )T = =A
−3 2

T 6 −1 T T 6 −1
(ii) (A + B) = and A + B =
14 7 14 7
1.23 The given
 system of equations
  can be
written in the
 matrix
 form AX = B
2 6 1 7 x1
where A =  1 2 −1 ,B =  −1  and X =  x2  .
5 7 −4 9 x3
775
1.25 (i) We have x1 = x3 , 2x2 = x3 +2x4 , x2 = x3 . We find that x1 = x2 = x3

so the second equation becomes 2x1 = x1 + 2x4 or x1 = 2x4 . A solution of the
system is x1 = x2 = x3 = 2t, x4 = t. Letting t = 1 we obtain the balanced
equation
2N a + 2H2 O → 2N aOH + 1H2 .
(ii) From x1 C5 H8 + x2 O2 → x3 CO2 + x4 H2 O we obtain the system 5x1 =
x3 , 8x1 = 2x4 , 2x2 = 2x3 + x4 . Choosing x1 = t we see that x3 = 5t, x4 = 4t
and x2 = 7t. Taking t = 1 we obtain the balanced equation
C5 H5 + 7O2 → 5CO2 + 4H2 O.
1.27 (i) Interchange row 1 and row in I3 . It is elementary.

(ii) Multiply row 3 by c in I3 . It is elementary.
(iii) Add row 4 to row 1 in I4 . It is elementary.
1.29 Either use definition of linearly independence or the following theorem
[6]. It is elementary.
Let S = v1 , v2 , v3 be a subset of R3 . Let M be the matrix whose columns
(rows) are the elements of S. Then S or vectors of S are linearly independent
of R3 if detA 6= 0.

1 2 3

M = 1 0 1 = 1(0 + 1) − 2(5 − 1) + 3(−1 − 0) = 1 − 8 − 3 = −10 6= 0
1 −1 5
hence vectors (1, 2, 3), (1, 0, 1) and (1, −1, 5) are linearly independent.
1.31 detA = 48, and detB = 40
1.33 detA = a11 a22 − a21 a12

a11 a21
At =
a12 a22
detAt = a11 a22 − a21 a12
Hence detA = detAt .
1.35 detA2 = detA.detA = (detA)2 = detI = 1 or detA = ±1.

4 3
1.37 A =
3 2

sin θ − cos θ
1.39 A−1 =
cos θ sin θ
1.41 (i) Eigenvalues
are λ1 = 4, λ2 = −2. Eigenvectors corresponding
to
0 6
λ1 = 4 are and eigenvector corresponding to λ1 = −2 is .
4 −1
√ √
3+ 14i 3− 14i
(ii) λ1 = 2 , λ2 = 2 . Eigenvector corresponding to λ1 is
Solution of Odd Number Exercises 777
√ √
−1 + 47i −1 − 47i
and Eigenvector corresponding to λ2 is .
4 4

1
(iii) λ1 = λ2 = 0. There is only one associated eigenvector .
0
 
0
(iv) λ1 = 0, corresponding eigenvector is  1  .
0
 
2
λ2 = 2, corresponding eigenvector is  1  .
2
 
0
λ3 = 3, corresponding eigenvector is  2  .
3
(v) λ1 = 0, λ2 = 1, λ3 = 7


14
Eigenvector corresponding to λ1 is  7  .
10
 
6
5
 
0
1

−1 3 5 3
1.43 (i) Matrix P = diagonalizes A = .
1 1 1 3

1 0
(ii) A = is not diagonalizable.
−4 1
   
0 5 0 5 0 0
(iii) M =  1 1 −3  diagonalizes A =  1 0 3 .
0 0 2 0 0 −2
1.45 See [6, pp. 483-486].
1.47 AX = λX, Σnj=1 akj xj = λxk
|λxk | ≤ Σnj=1 akj |xj | ≤ Σnj=1 akj |xk | = |xk | or λ ≤ 1.
1.49 See [10].
1.51 See [10].
2.1 (a) Order 1 and linear

(b) First order and nonlinear
(e) Order 2 and linear
(f) Order 2 and nonlinear
1 1
2.3 y 0 = − e−x/2 , so − e−x/2 + e−x/2 = 0.
2 2

1 2 2 1
2.5 y 0 = − 3 (−2) = 3 so x2 + 2x − = 0.
x x x3 x2
2.7 y 0 = ex , ex = (ex + 1) − 1 = ex .
√ ∂f 1 x dy √
r
2.9 Let f (x, y) = xy, then
= . The differential equation = xy
∂y 2 y dx
will have unique solution in any region where x > 0 and y > 0, where x < 0
and y < 0.
∂f
2.11 We have = 1 if we choose f (x, y) = x + y. Therefore, the differential
∂y
equation will have unique solution in the entire plane.
∂f
2.13 For f (x, y) = x2 cos y, = −x2 sin y.
∂y
The equation has unique solution in the entire plane.
dy
2.15 (i) = −e−3x
dx
or dy = −e−3x dx
1
y = e−3x + c
3
(v) N dN = (tet+2 − 1)dt or log|N | = tet+2 − et+2 − t + c
(vi)
dy (y − 1)(y + 1)
=
dx (x − 1)(x + 1)
dy dx
or =
(y − 1)(y + 1) (x − 1)(x + 1)

1 1 1 1 1 1
or dy − = − dx
y−1 y+1 2 2 yx − 1 x + 1
or ln |y − 1| − ln |y + 1| = ln |x − 1| − ln |x + 1| + ln c
y−1 c(x − 1)
or =
y+1 x+1
Using y(2) = 2 we find c = 1. the solution of the given initial value problem is
y−1 x−1
= or y = x
y+1 x+1
dy
2.17 (i) dx − xy = 2y, y(1) = 5.
2.19 (i) 4m2 + 8m + 16 = 0. this gives m1 = −4 and m2 = −4, that is,
m1 = m2 so the general solution is
y = c1 e−4x + c2 xe−4x .
(ii) m2 − 4m + 5 = 0, m1 = 2 + i, m − 2 = 2 − i, The general solution is

y(x) = e2x (c1 cos x + c2 sin x).
√
(iii) 2m2 − 3m + 4 = 0, m1 = 3/4 ± 23i 4 , so the general solution
√ √
y(x) = e3π 4(c1 cos 23x/4 + c2 sin 23x/4).
√
2.21 (i) y = 2 sin 2x − 12 . (ii) y = 200e−x/5 − 200 − 3x2 + 30x.
2.23 (i) The auxiliary equation is m2 − 8m + 41 = 0 so that
y = x4 [c1 cos(5 ln x) + c2 sin(5 ln x)].
2.25 Desired differential equation is

dx
= r − kx(t), where k > 0.
dt
2.27 The mathematical model is
dm
= km,
dt
where m(t) is the mass at time t and k is the constant of proportionality
depending on the substance.
2.28 Let P (t) be the number of owls present at time t. Then
dP
= k(P − 200 + 10t).
dt
2.31 Let S(t) denote the sale in rupees, then desired model is
dS(t)
= k(P − Q),
dt
where P is the original amount by which sale decline after removing the ad-
vertising budget. 2.33 Desired model is
dP (t)
= kP (t), P (0) = P0 ,
dt
P (2) = 2P0
dP (t)
2.35 = λ(P (t)) or P (t) = P (0)eλt
dt
3
P0 = P0 e λ
2
3
λ = ln
2
3P0 = P0 eλt
1
3 = eλt or t = ln 3
λ
1 3
or t = 3 ln 3 = ln 3 − ln 2
ln 2
t = ln 3 − ln 3 + ln 2.
2.37 100000 = 50, 000eλ or λ = ln 2.
P (t) = P0 e2 ln 2 = 50000e2 ln 2 .
2.39 Similar to Example 126

2.41 Similar to Example 128.
2.43 Volume V depends on r3 and surface area S on r2 , where r is the radius
of the given spherical drop. Therefore, surface area depends on

2/3 3 2 V
V , V = k1 r , S = k2 r = k1 = kV 2/3 .
k1
Desired differential equation is

dV
= −cV 2/3 , whereV is a constant.
dt
2.47 Similar to Example 140.
dI
2.49 = k(T − 10) so T = 10 + cekt . If T (0) = 700 and T ( 12 ) = 50 then
dt
c = 60 and k = 2 ln(2/3) so T (1) = 36.670 . If T (t) = 150 then t = 3.06.
dI IR V
2.51 + = . Solve as a linear differential equation with integrating
dt L L
Rt V −Rt
factor = e L , and using initial conditions. Check I = (1 − e L ) satisfies
R
the differential equation.
2.53 Let S be the desired distance. We can determine S by solving the differ-
dS
ential equation = At + B, Where A and B are constants to be determined
dt
by the given equations.
tn n!
2.55 limt→∞ e−bt tn = limt→∞ = limt→∞ n bt = 0.
ebt b e
tn = O(ebt ), t → ∞ for any positive value of b. Hence tn = O(ebt ), t → ∞ and
tn is of exponential order.
2.57 e−3t cos 2t − 32 e−3t sin 2t
1 4 3t
2.59 y(t) = 2e3t + 11e3t + 12 t e
2.61 f (t) = 4e−4 − 7te−t + 4t2 e−t

√ √ 1
√ √ √ √
2.63 f (t) = cos 2t cosh 2t + 2√ 2
(sin 2t cosh 2t + cos 2t sinh 2t) +
√ √
sin 2t cosh 2t
√ √ √ √
2.65 y1 (t) = −10 2 sin 2t + 53 sin 2 3t
√ √ √ √
y2 (t) = −5 2 sin 2t − 103 sin 2 3t
16 −2t 13 t 13 −t 3 2
2.67 y(t) = 39 e + 60 e − 20 e + 130 cos 3t − 130 sin 3t
2.69 [Ref [1] of Chapter 5, p. 87]. Apply principle of finite mathematical in-
duction.
2.71 y = c0 y1 (x) + c1 y2 (x), where
y1 (x) = 1 + 12 x2 + 16 x3 + 1 4 1 5
24 x + 30 x . . .
y2 (x) = x + 16 x3 + 121 4
x + 1201
x5 + . . .
1 2 1 4 1 3 1 5 1 2 1 3
2.73 y = c0 (1− 2! x + 4! x +. . . ) +c1 (x− 3! x + 5! x +. . . ) + 2! x + 3! x +. . .
1 2 1 3 1 4
2.75 y1 (x) = 1 + 2 x + 6 x + 6 x . . .
y2 (x) = x + 12 x2 + 12 x3 + 14 x4 + . . .
2.77 y = x − 13 x3 + 15 x5 − 17 x7 + . . . y1 (x) = 1 + 12 x2 + 16 x3 + 24
1 4
x + 301 5
x ...
1 3 1 4 1 5
y2 (x) = x + 6 x + 12 x + 120 x + . . .
2.79 y = c1 y1 + c2 y2 , where
y1 = c0 (1 + 14 x2 + 1 4
48 x + . . . )
y2 = − 12 y1 ln x + y[− 2x1 2 + 96
7 2
x − 19
2304 x
4
+ ...]
2.83 Since v 2 = 0 in the Bessel equation the general solution is c1 J0 (x) +
c2 Y0 (x), See [13].
2.85 It is Legendre’s equation of order 0. It can be seen from solution of Leg-
endre’s equation in Section 2.8.5 that it is y0 (x) = c0 = 1.
3.1 (i) Let x = (x1 , x2 , x3 ), y = (y1 , y2 , y3 ) and z = (z1 , z2 , z3 ). Then

(x + y) + z = [(x1 , x2 , x3 ) + (y1 , y2 , y3 )] + (z1 , z2 , z3 )

= (x1 , x2 , x3 ) + (y1 + z1 , y2 + z2 , y3 + z3 )
= x + (y + z)
(vi) Since

y1 y2 y3

x1 x2 x3
y.(x × y) =

y1 y2 y3

= y1 (x2 y3 − x3 y2 ) − y2 (x1 y3 − x3 y1 ) + y3 (x1 y2 − x2 y1 )
= y1 x2 y3 − x3 y2 y1 − y2 x1 y3 − x3 y1 y2 + y3 x1 y2 − x2 y1 y3
= 0,
y is perpendicular to x × y or x × y is perpendicular to y.
3.3 (a) (i) We have x0 = 4, y0 = 2, a = −1 and b = 5; therefore the parametric
equations are x = 4 − t, y = 2 + 5t.
(ii) x = 1 + 4t, y = 2 + 5t, z = −3 − 7t.
d dG dF
3.5 (a) We know that [F.G] = F + G. Taking F = G = γ yields
dt dt dt
d
(γ(t).γ 0 (t)) = γ(t).γ 0 (t) + γ 0 (t).γ(t)
dt
d
so [||γ(t)||2 ] = 2γ(t).γ 0 (t).
dt
Since ||γ(t)|| is constant, γ(t).γ 0 (t) = 0. Therefore, γ(t) is orthogonal to γ 0 (t).
3.7 (b) We know that ∇.(F × G) = G.(∇ × F ) − F.(∇ × G).

Choosing F = ∇φ, G = ∇ψ.
Using ∇ × ∇φ = 0, ∇ × ∇ψ = 0, we get
∇.(∇φψ) = ∇ψ.0 − ∇φ.0 = 0
or div(∇φ × ∇ψ) = 0.
3.9
Z C Z 0
F.dr = (i − xj + k).(− sin ti − cos tj)dt
Zππ
= (− sin t − x(t) cos t)dt, wherex(t) = sin t
Z0 π
1
= (− sin t −
sin 2t)dt
0 2
= [[cos t]π0 + [cos 2t]π0 = 0
H
3.11 The work done is equal to C F.dr. Let D be the disk enclosed by the
circle C and let A(D) denote its area. Then
I I
F.dr = (ex − y + x cosh x)dx + (y 3/2 + x)dy
C
Z Z
∂ 3/2 ∂ x
= (y + x) − (e − y + x cosh x dA
D ∂x ∂y
by the theorem of Green Ostrogradski
Z Z
= (1 + 1)dA = 2A(D) = 2π(12)2 = 288π
D
3.13 (a) Since divF = 0 for F = 2yzi − 4xzj + xyk,

RRR
D
divF dV = 0 for the ball D bounded by the sphere Σ.
3.15 By Gauss Divergence theorem

Z Z Z Z Z Z Z Z
C.ndσ = divCdV = 0dV = 0
Σ D D
as C is constant, divC = 0.
H RR
3.17 By Storke’s theorem, C F.dr = Σ
(curlF ).ndσ, where n is suitably
orient normal to the disk x2 + y 2 ≤ 1. We have

i j k

curlF = ∂ ∂ = −zaj + (2xy + 1)k

∂
∂x ∂y ∂z
Since the disk in horizontal, it is parameterized

p by z = S(x, y) =
√ 0 and
n = k. Therefore curlF.n = 2xy + 1 and 1 + (∂x S)2 + (∂y S)2 = 1 = 1.
We compute
I Z Z Z Z Z Z
F.dr = (curlF ).ndσ = 1dσ = 1dA = π,
C Σ Σ D
since x = y = 0 on Σ and area of disk D equals π.

−5
3.19 It can be shown that the line integral of given F is π and the surface
4
−5
integral of given vector field is also equal to π. This verifies the Stroke’s
4
theorem.
4.1 Yes. An orthogonal set can be converted into an orthonormal system by

dividing each vector by its magnitude.
2 n+1
4.3 (a) a0 = 2π, an = 0, bn = n (−1) and f (x) = π +
Σ∞ 2
n=1 n (−1)
(n+1)
sin nx.
4.5 Graph of the third partial sum of the Fourier series can be seen as the
screen of a computer using MATLAB and MATHEMATICA.
2(π+1)
4.7 f (x) = Σ∞
n=1 ( nπ (−1)
n+1
+ 2
nπ ) sin nπ
4.9 Part (a) is Bessel’s inequality and its proof can be found in [3] of Chapter
4.
(c) It is famous Riemann-Lebsegue lemma and its proof can be found in [1,3]
4.13 Follows from definition.
4.15 We find fˆ for f = χ[− 12 , 12 ]
2 5
fˆ = sin
s 2
Separation of variables method for heat equation

∞ 2
5.1 u(x, t) = π8 1
e−(2n−1) sin(2n − 1)x
P
(2n−1)2
n=1
∞ (2n−1)2 π 2 t
(2n−1)π−8(−1)n −
4
sin (2n−1)π
P
5.3 u(x, t) = π2 (2n−1)2
e 64
8 x
n=1
∞ nπ n2 π 2 t
1−cos
40
e− sin nπ
P
5.5 u(x, t) = π n
2 64
2 x
n=1
Laplace transform for heat equation

x
5.7 u(x, t) = u1 + (u0 − u1 ) erf c √
2 t
Fourier transform for heat equation

1 1−x 1 1+x
5.9 u(x, t) = erf √ + erf √
2 2c t 2 2c t
R∞ (x−τ )2
5.11 u(x, t) = t√12π erf f (τ ) e− 2t2 dτ
−∞
Separation of variables method for wave equation

∞
5.17 u(x, t) = π2 1 2nπ
P
n2 sin 3 sin nx cos nt
n=1
∞
4
P (−1)n+1
5.19 u(x, t) = π (2n−1)2
sin (2n − 1) x cos (2n − 1) t
n=1
Fourier transform for wave equation
R∞
5.21 u(x, t) = 1
2 e−|ω| cos ωteiωx
−∞
" #
Rt x+c(t−τ
R )
1
5.23 u(x, t) = 2c f (s, τ ) ds dτ
0 x−c(t−τ )
Laplace transform for wave equation
5.25 u(x, t) = (t − x) sinh (t − x) H (t − x) + xe−x cosh t − te−t sinh t, H (.) is

the unit step Heaviside function.
1
5.27 u(x, t) = sin πx cos πt − π sin πx sin πt.
Separation of variables method for Laplace equation

2 P∞ 2 P∞
1−(−1)n sinh nπ 1−(−1)n sinh nπ
5.29 u (x, y) = n sinh ny sin nx + n
π n=1 π n=1
[sinh ny + sinh n (π − x)] sin ny
∞
400 P sin (2n−1)π x h i
5.31 u (x, y) = 2
sinh (2n−1)π
2 (1 − y)
π n=1 (2n − 1) π
(2n − 1) sinh
2
∞
200 P 1
+ sinh nπx sin nπy
π n=1 n sinh 2nπ
sin 7πx sinh (7π (1 − y)) sin πx sinh πy

5.33 u (x, y) = +
sinh 7π sinh π
sin 3πy sinh (3π (1 − x)) sinh 6πx sin 6πy
+ +
sinh 3π sinh 6π
1 ∞ 1 − (−1)n
2 P
5.35 u (x, y) = x+ 2 sinh nπx cos nπy
2 π n=1 n2 sinh nπ
∞
nπ
Ra
cos nπ 1
P
5.37 u (x, y) = A0 y + An sinh a y a y; A0 = ab f (x) dx,
n=1 0
1
Ra
An = a sin nπ f (x) cos nπ
a xdx.
a b
0
Fourier transform for Laplace equation
2
Rπ sinh ω(2−y)
5.41 u (x, y) = π F (ω) (1+ω 2 ) sinh 2ω sin ωxdω
0
h i
1
5.43 u (x, y) = π arctan 1+x
y + arctan 1−x
y
6.6 The two iterations of Jacobi’s method give the following results.
a. (0.1428571, −0.3571429, 0.4285714)t
b. (0.97, 0.91, 0.94)t
c. (−0.65, 1.65, −0.4, −2.475)t
6.7 b. r0 = (1, 0, −2)t , x1 = (5, 0, −1)t ,
r1 = (−1, −2, −5)t , x2 = (.51814, −.72539, . − 1.94301)t
r2 = (1.28497, −.80311, .64249)t , x2 = (1, −1.4, −2.2)
c. r0 = (1, 2, 0, −1)t , x1 = (.2, .4, 0, −.2)t

r1 = (1.2, −.8, −.8, −.4)t , x2 = (.90654, .46729, −.33645, −.57009)t
r2 = (−1.45794, −.59813, −.26168, −2.65421)t ,
x3 = (4.56612, .40985, −2.92409, −5.50820)t
r3 = (−1.36993, 1.11307, −3.59606, .85621)t ,
x4 = (9.50, 1.25, −10.25, −13.00)t
7.1 xn yn
1.00 5.0000
1.10 3.9724
1.20 3.2284
1.30 2.6945
1.40 2.3163
1.50 2.0533
7.3 xn yn Actual Value

0.00 2.0000 2.0000
0.10 2.1230 2.1230
0.20 2.3085 2.3085
0.30 2.5958 2.5958
0.40 3.0649 3.0650
0.50 3.9078 3.9082
7.5 xn yn
1.00 1.0000
1.10 1.0101
1.20 1.0417
1.30 1.0989
1.40 1.1905
1.50 1.3333
7.7 (a) Write the equation in the form
dv
= 32 − 0.125v 2 = f (t, v).
dt
tn vn
0.0 0.0000
1.0 25.2570
2.0 32.9390
3.0 34.9770
4.0 35.5500
5.0 35.7130
(b) Separating variables and using partial fractions we have

1 1 1
√ √ √ +√ √ dv = dt
2 32 32 − 0.125v 32 + 0.125v
1 √ √ √ √
√ √ ln 32 + 0.125v − ln 32 − 0.125 = t + c

2 32 0.125
Since v(0) = 0, we find c = 0. Solving for v we obtain
√
16 5 e3.2t − 1
v(t) =
(e3.2t + 1)
and v(5) ≈ 35.7678. Alternatively, the solution can be expressed as
r r
mg kg
v(t) = tanh t.
k m
7.9 The figure shows the values of u(x, y) along the boundary.
We need to determine u11 and u21 . The system is
u21 + 2 + 0 + 0 − 4u11 = 0
1 + 2 + u11 + 0 − 4u21 = 0
or
−4u11 + u21 = −2
u11 − 4u21 = −3.
Solving we obtain u11 = 11/15 and u21 = 14/15.
7.11 The figure shows the values of u(x, y) along the boundary. We need to
determine u11 , u21 , u12 ,and u22 . By symmetry u11 = u21 and u12 = u22 . The
system is
3/2 + 0 + u11 − 4u12 = 0

u21 + u12 + 0 + 0 − 4u11 = 0
0 + u22 + u11 + 0 − 4u21 = 0
p
u22 + 3/2 + 0 + u11 − 4u12 = 0
p
0 + 3/2 + u12 + u21 − 4u22 = 0
or
3u11 + u12 = 0
r
3
u11 − 3u12 = − .
2
√ √
Solving we obtain u11 = u21 = 3/16 and u12 = u22 = 3 3/16.
7.13. The results of the computation are given in the table
Time x = 0.25 x = 0.50 x = 0.75 x = 1.00 x = 1.25 x = 1.50 x = 1.75

0.000 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000
0.100 0.3728 0.6288 0.6800 0.5904 0.3840 0.2176 0.0768
0.200 0.2248 0.3942 0.4708 0.4562 0.3699 0.2517 0.1239
0.300 0.1530 0.2752 0.3448 0.3545 0.3101 0.2262 0.1183
0.400 0.1115 0.2034 0.2607 0.2757 0.2488 0.1865 0.0996
0.500 0.0841 0.1545 0.2002 0.2144 0.1961 0.1487 0.0800
0.600 0.0645 0.1189 0.1548 0.1668 0.1534 0.1169 0.0631
0.700 0.0499 0.0921 0.1201 0.1297 0.1196 0.0914 0.0494
0.800 0.0387 0.0715 0.0933 0.1009 0.0931 0.0712 0.0385
0.900 0.0301 0.0555 0.0725 0.0785 0.0725 0.0554 0.0300
1.000 0.0234 0.0432 0.0564 0.0610 0.0564 0.0431 0.0233
7.15 The table in this section gives a selection of the total number of approx-
imations
Time x = 0.25 x = 0.50 x = 0.75 x = 1.00 x = 1.25 x = 1.50 x = 1.75

0.000 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000
0.100 0.4015 0.6577 0.7084 0.5837 0.3753 0.1871 0.0684
0.200 0.2430 0.4198 0.4921 0.4617 0.3622 0.2362 0.1132
0.300 0.1643 0.2924 0.3604 0.3626 0.3097 0.2208 0.1136
0.400 0.1187 0.2150 0.2725 0.2843 0.2528 0.1871 0.0989
0.500 0.0891 0.1630 0.2097 0.2228 0.2020 0.1521 0.0814
0.600 0.0683 0.1256 0.1628 0.1746 0.1598 0.1214 0.0653
0.700 0.0530 0.0976 0.1270 0.1369 0.1259 0.0959 0.0518
0.800 0.04130 0.0762 0.0993 0.1073 0.0989 0.0755 0.0408
0.900 0.0323 0.0596 0.0778 0.0841 0.0776 0.0593 0.0321
1.000 0.0253 0.0466 0.0609 0.0659 0.0608 0.0465 0.0252
7.17(a) Identifying h = 1/4 and k = 1/10 we see that δ = 2/5.

Time x = 0.25 x = 0.5 x = 0.75
0.00 0.1875 0.2500 0.1875
0.10 0.1775 0.2400 0.1775
0.20 0.1491 0.2100 0.1491
0.30 0.1066 0.1605 0.1066
0.40 0.0556 0.0938 0.0556
0.50 0.0019 0.0148 0.0019
0.60 -0.0501 -0.0682 -0.0501
0.70 -0.0970 -0.1455 -0.0970
0.80 -0.1361 -0.2072 -0.1361
0.90 -0.1648 -0.2462 -0.1648
1.00 -0.1802 -0.2591 -0.1802
(b) Identifying h = 1/4 and ∆t = 1/10 we see that δ = 1/4
Time x = 0.4 x = 0.8 x = 1.2 x = 1.6

0.00 0.0032 0.5273 0.5273 0.0032
0.10 0.0194 0.5109 0.5109 0.0194
0.20 0.0652 0.4638 0.4638 0.0652
0.30 0.1318 0.3918 0.3918 0.1318
0.40 0.2065 0.3035 0.3035 0.2065
0.50 0.2743 0.2092 0.2092 0.2743
0.60 0.3208 0.1190 0.1190 0.3208
0.70 0.3348 0.0413 0.0413 0.3348
0.80 0.3094 -0.0180 -0.0180 0.3094
0.90 0.2443 -0.0568 -0.0568 0.2443
1.00 0.1450 -0.0768 -0.0768 0.1450
[c:(i)]The solution of the wave equation is shown to be
∞
X
u(x, t) = (an cos nπt + bn sin nπt) sin nπx
n=0
where
Z 1
1, n = 1
an = 2 sin nπx sin nπxdx =
0 0, n = 2, 3, 4
and Z 1
2
bn = 0dx = 0.
nπ 0
Thus
u(x, t) = cos πt sin πx
[c:(i) and (ii)]:

We have h = 1/4, ∆t = 0.5/5 = 0.1 and δ = 0.4. Now u0,n = u4,n = 0
or n = 0, 1, ..., 5, and the initial values of u are u1,0 = u(1/4, 0) = sin π/4 ≈
0.7071, u2,0 = u(1/2, 0) = sin π/2 = 1, u3,0 = u(3/4, 0) = sin 3π/4 ≈ 0.7071.
From equation (7.93) in the text we have uj,1 = 0.8(uj+1,0 + uj−1,0 ) + 0.84uj,0
+0.1(0). Then u1,1 ≈ 0.6740, u2,1 = 0.9531, u3,1 = 0.6740.From equation
(7.86) in the text we have for n = 1, 2, 3, ...
uj,n+1 = 0.16uj+1,n + 2(0.84)uj,n + 0.16uj−1,n − uj,n−1
The results of the calculations are given in the table.

Time x = 0.25 x = 0.50 x = 0.75

0.0 0.7071 1.0000 0.7071
0.1 0.6740 0.9531 0.6740
0.2 0.5777 0.8169 0.5777
0.3 0.4272 0.6042 0.4272
0.4 0.2367 0.3348 0.2367
0.5 0.0241 0.0340 0.0241
i, j Approx Exact Error

1,1 0.6740 0.6725 0.0015
1,2 0.5777 0.5721 0.0056
1,3 0.4272 0.4156 0.0116
1,4 0.2367 0.2185 0.0182
1,5 0.0241 0.0000 0.0241
2,1 0.9531 0.9511 0.0021
2,2 0.8169 0.8090 0.0079
2,3 0.6042 0.5878 0.0164
2,4 0.3348 0.0000 0.0340
3,1 0.6740 0.6725 0.0015
3,2 0.5777 0.5721 0.0056
3,3 0.4272 0.4156 0.0116
3,4 0.2367 0.2185 0.0182
3,5 0.0241 0.0000 0.0241
8.1 (a) (5 − 9)i + (2 − 4i) = x + iy

7 − 13i = x + iy
x = 7
y = −13
10 − 5i (10 − 5i)(6 − 2i)

(b) =
6 + 2i (6 + 2i)(6 − 2i)
60 − 30i − 20i − 10
=
36 + 4
1
= (50 − 50i)
40
5 5
= − i
4 4
= x + iy
where x = 54 , y = − 54
√
8.3 (b) 5 2(cos 7π 7π
4 + i sin 4 )
8.5 (a)
π π π π
z1 z2 = 8[cos(
+ 3 ) + i sin( + 3 )]
8 8 8 8
= 8i
z1 1 π π π π
= [cos( − 3 ) + i sin( − 3 )]
z2 2 8 8 8 8
√ √
2 2
= −i
4 4
8.9 (a) f (z) = (7x−9y−3)+i(7y−9x+2) where u(x, y) = 7x−9y−3, v(x, y) =
(7y − 9x + 2)
5z 2 −2z+2 5(1+i)2 −2(i+1)+2
8.11 limz→1+i z+1 = 1+i+1
5(1 + i2 + 2i) − 2i
=
2+i
(z+∆z)2 −z 2 2z∆z+(∆z)2
8.13 (a) f 0 (z) = lim∆z→0 ∆z = lim∆z→0 ∆z
= lim (2z + ∆z) = 2z

∆z→0
8.15 (a) 3i (b) 2i and −2i

8.17 Cauchy Riemann Equations
∂u ∂v ∂u ∂v
8.19 (a)u = y, v = x, =0= , = 1, − = −1.
∂x ∂y ∂y ∂x
Since 1 6= −1, f is not analytic at any point.
∂u y 2 − x2 ∂v x2 − y 2 ∂u −2xy ∂v
(b) = 2 2
, = 2 2
, = 2 2
=
∂x (x + y )2 ∂y (x + y )2 ∂y (x + y )2 ∂x
The Cauchy - Riemann equations hold only at (0,0). Since there is no neigh-
bourhood about z = 0 within which f is differentiable , we conclude f is
nowhere analytic.
∂2u ∂2u ∂2u ∂2u
8.21 (a) = 0, = 0 or + = 0. Thus, u(x, y) = 2x − 2y is
∂x2 ∂y 2 ∂x2 ∂y 2
harmonic.
8.23 See [1]
8.25 See [1]
8.27 See [11]
1 z3 z5 z7
8.28 (a) f (z) = [z − (z − + − + . . . )]
z5 3! 5! 7!
1 1 z2 z4
= 2
− + − + ...
3!z 5! 7! 9!
3z − 1
8.29 (a) From f (z) =
[z − (−1 + 2i)][z − (−1 − 2i)]
−1 + 2i and −1 − 2i are simple poles.
(z − 4i)z
8.31 (a) Res(f (z), 4i) = lim
z→4i (z − 4i)(z + 4i)
z
= lim
z→4i (z + 4i)
1
=
2
z
Res(f (z), 4i) = lim
z→−4i (z − 4i)
1
=
2
Z 2π I
1 4 z
8.33(a) 2
dθ = dz
0 1 + 3 cos θ i c 3z 4
+ 10z 2 + 3
√ √
4 3i 3i
= ( )2πi[Res(f (z), ( ))] + [Res(f (z), −( ))]
i z z
=π
8.35 (a) f is conformal at all points except z = ±1 as f 0 (z) = 3(z 2 − 1) 6= 0

for z 6= ±1
8.37 The image of the circle Imz = 2 under the linear fractional transforma-
2z + i
tion w = is the circle with center at ( 72 , 0) and radius 32 .
z−i
8.39 See [1] and [11]
8.41 See Example 3 of section 20.2, p. 901 0f [11]
9.1 The relevant equation
D2 − 2(ct + c2 /g)D + (ct)2 = 0

has two roots. p

ct + c2 /g ± 2c3 t/g + c4 /g 2
Clearly D < ct (t is total time ), so roots corresponding to the + is physically
impossible.
π
9.3 w(x) = 4 cos πx
2 .
9.5 See, [19]

9.7 See, [19]
1
9.9 γ(t) = the variable interest rate = 2t+10 + .02

10.1 (a) [0,1], [0,1]
(b) For Haar wavelet, see Figure 10.2 . The function ϕ(x − k) has the same
graph as ϕ but translated to the right by k units (assuming k is positive
integer.) Let V0 denote the space of all functions of the form
X
ak ϕ(x − k), ak ∈ R.
k∈Z
ψ(t) = ϕ(2t) − ϕ(2t − 1) is Haar Wavelet.

It can be checked that ψ(2t − k), k ∈ Z is orthonormal .
The graph of function φ(2x − k) = φ(2(x − k2 )) is the same as the graph of
the function φ(2x) but shifted to right by k2 units. The graph of φ(2j t) is a
spike of width 21j .
(c) ψ(t) = φ(2t) − φ(2t − 1)
Graph of ψ(t − 3), note shift of 3 units to right.
graph of ψ(2t − 3) = ψ2(t − 32 ); note shift of 3
2 units to right and compression
by factor 2.
graph of ψ(4t − 3) = ψ4(t − 32 ); note shift of 3
4 units to right and compression
by factor 22 .
graph of ψ(4t − 3) = ψ22 (t − 32 ); note shift of 3
4 units to left and compression
by factor 22 .
Graph of ψ(23 t), not compression by a factor of 23 .
ψ(2t − k) = ϕ(2(2t − k)) − ϕ(2(2t − k) − 1)

= ϕ(4t − 2k) − ϕ(4t − 2m − 1)
For any integer j and k

0.9 - - O(t/2) ~
O.B
0.7
0.6
0(21) 0.5
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1
FIGURE E.1: A dilated Haar scaling function φ(t/2)
1.0
O.B
0.6
0.4
0.2
-4 -2
FIGURE E.2: A translated Haar scaling function φ(t + 3) to left by t0 = −3.
R∞
−∞
ψ(2t−k)ψ(2t−j)dt = 0 if j 6= k. See Figures E.1 to E.4. For more details,
see [31]
1.0
0.8
0.6
0.4
0 .2
-4 -2
FIGURE E.3: A translated Haar scaling function φ(t − 3) to t right by t0 = 3.
'
~
0.8
0.6
0.4
0.2
lj/(21)
-0.2
-0.4
-0 .6
-0 .8
' ' '

2-1.75-1 .5-1 .25-1-0.75 -0.25 0 0.25 0.5 0 .751 1.251.51.75 2
'
FIGURE E.4: The scaled (contracted) Haar wavelet: ψ(2t)
Z ∞
10.3 (i) < ϕ, ψ > = ϕ(t)ψ(t)dt
−∞
1
Z 0 Z 2
= ϕ(t)ψ(t)dt + ϕ(t)ψ(t)dt
−∞ 0
Z 1 Z ∞
+ ϕ(t)ψ(t)dt + ϕ(t)ψ(t)dt
1
2 1
1
Z 2
Z 0
= 0+ 1dt − 1dt
1
0 2
= 1 − 1 = 0.
Hence ϕ and ψ are orthogonal.

R∞ R1 R1 R1
(iii) −∞ ψ 2 (t)dt = 0 ψ 2 (t)dt = 02 ψ(t)ψ(t)dt + 1 ψ(t)ψ(t)dt
2
R 1 R1
= 2
0
1dt + 1 (−1)(−1)dt
2
= 1.
10.5 ϕ(x − k) is an orthonormal system, that is
hφ(x − k)φ(x − j)i = 0 for j 6= k
||ϕ(x − k)||2L2 = 2k .
Infact, 2j ϕ(2j t − k) is an orthonormal system. ϕ(x − j) and ϕ(x − k) have
disjoint support.
10.7 Let f (t) denote a signal over interval [a, b] then its energy is defined
Rb 1 PN 1
as Ef = ( a |f (t)|2 dt) 2 . If f (t) is discrete signal then Ef = ( n=1 |f (tn )|) 2 .
Energy is the characteristic
PN Pof a signal . If a function is represented by a wavelet
M 1
series then Ef = ( J=1 k=1 |dj,k |2 ) 2 where dj,k are wavelet coefficients of
f and
1
Ef = (||Wf ||2 ) 2
where Wf is the continuous wavelet transform of f . For more details see [1],
[14], [18], [22], [28], [34] and [33] through [40].

log 2
11.1 (a)(i) The fractional dimension of Cantor set is log 3 ≈ .631
log 3
(ii) The fractional dimension of Sierpinski triangle is log 2 ≈ 1.585
(iii) The fractional dimension of von koch curve is 4 ln 3 = 1.26186.
(b) See, [33] and [34]
11.3 See, [35]
11.5 See, [9] and [11]
11.7 Yes, See, [7], [8], [22], [24] through [29], [32], [39] through [44], [52] and
[53]
11.9 Hurst is the name of hydrologist who studied Nile river flows and reser-
voir modeling [27].
Let H denote Hurst parameter and let D denote (Hausdoff ) fractional di-
mension then D = 2 − H.B.B. Mandelbrot
11.11 See, [6, p.8]
FIGURE E.5
FIGURE E.6
FIGURE E.7
FIGURE E.8
11.13 A graph that has neither self-loops nor parallel edges is called simple
graph.
11.15 Choosing F = X 2 in Itô Lemma we get
dF = 2XdX + dt
Z t Z 1
X2 = F (X) = F (0) + 2XdX + 1dt
0 0
Z t
= 2XdX + t
0
Rt
Therefore 0
X(Γ)dX(Γ) = 12 X 2 (t) − 12 t
11.17 The knots are x = 0, x = 1 and x = 4

11.19 See, [15].
Bibliography
[1] A. Iske and T. Randen (Eds), Mathematical Methods and Modelling in

Hydrocarbon Exploration and Production. Springer, Berlin, New York,
2006.
[2] S. M. Luthi, Geological Well Logs: Their Use in Reservoir Modeling,
Springer, Berlin, 2001.
[3] R. C. Selly, Elements of Petroleum Geology, Academic Press, San Diego,
1998.
[4] J. Stewart, Calculus, Brooks/Cole Publishing Company, 1999.
[5] G. Strang, Calculus, PDF file at MIT http://onlinebooks.library.upenn.
edu/book/lookpid. QA303. 57.
[6] E. W. Swokowski, M. Olinick, D. Pence, J. A. Ecole, Calculus, Sixth

Edition, PWS Publishing, Boston, 1994.
[7] S. T. Tan, Applied Calculus, Second Edition, PWS, Kent Publishing,
Boston, 2007.
[8] T.M. Buzug, Computed Tomography, Springer, 2008.
[9] T.G. Feeman, The Mathematics of Medical Imaging, Springer, 2010.
801
Computer Programs Used
Chapter 1
Section 1.8 (MATLAB and MATHEMATICA)
Chapter 2
(i) Visualization of scalar and vector valued function
Let us discuss an important feature of MATLAB that is handling powerful
graphics. MATLAB has capability of presenting our output and helping us
in interpreting the data graphically with help of various types of curves and
plots.
Plotting scalar functions with MATLAB
2D dimensional Plots: To plot a function, we have to create two arrays (vec-
tors), one containing abscissa, and the other corresponding function values.
Let us plot f (x) = sin(x). Type following commands on command prompt.
>> x=-2*pi:pi/100:2*pi; % range of x from −2p to +2p in steps of p/100.
>> fx=sin(x); % function sine() to compute sine of al x.
>> plot(x,fx) % plot function, which plot f x vs x.
>> grid % creates grid in plot.
After giving all these commands, and pressing Enter Key, we get following
curve window as shown in Figure E.9.
Let us plot now f (t) = et/10 . sin(t). Type the following commands on com-
mand prompt.
>> t=0:0.01:50;
>> ft=exp(-t/10).*sin(t);
>> plot(t,ft)
>> grid
In above two examples, we have used plot() function for plotting. Now, we
will see that how we can combine two or more plots in one window. Just for
simplicity, let us combine plot of function f1 (t) = et/10 . sin(t) with another
function f2 (t) = et/10 in same figure window.
803
804 Computer Programs Used
0.8
0.6
0.4
0.2
·0 2
-0.4
·0.6
' '
-o a -------- .'~ -------- ''~ ------- ---- ' '
--- ! -------- -------- ----- -- ! -------
' '
' ' ' '
' ' ' '
FIGURE E.9
0.8 ----:------:------
. ' -:------ '
-:-------:------
' . ~.----- - -:'------ -:-------
'
~ --- --
i" ..... ...... ...... j...... .......: ......:.....

0 ' ' 0 0 ' '
0 0 I 0 0 I 0
0.6 .... ~ ...... : ....... ~ ~ ~

0 I I 0 0 I I I
0 ' ' ' 0 0 '
0.4 -- --- : -- - -~-------:-------~------ : ------ ~ ------~---- - --~ -- - ---~- - ---
. . .
• I I I 0 0 I I
0 ' ' ' 0 0 ' '
0 I 0 I 0 0 I 0
' ' ' ' ' '

0.2 --- --;- ---· -: --- - - ·-- ---- -:-- ----- t------ -:-- -- ---:- -- ----:-- -----r-----
• ' ' 0 0 ' ' '
I I 0 0 I I I
0 ' • • ' ' '
I I I I I I I
·0.2 .... -·. ...............................................................

o I
'
I
' . .
o o I
'
I
'
.............. .
I
0 ' ' 0 0 '

I I I I I I
·0.4 .... · l ..... ; .......:.......:....... ; ...... ; .......:.......:....... , .....

I 0 I I I
.
I I I
. .
o o I I I
---- ------... --- --.,. ----- .,. ------ .y------ ,.. ---- -.,. --- ---.. ----- .,. -----
I I I 0 0 I I I
'o 'o 'o 'o '
.. ' .. ..
I o o •
' ' ' ' ' '
-0.6 ' ' ' ' '
' ' ' ' ' '
I
'
0
'
0
'
0 I I
'
0
'
0
'
0
.o.a0.__5J.__1J_0__1.J. 5__2.J..0__2...5--3.l.0--3.1..5--4.. 1.0- -4.. 15---'50

.
FIGURE E.10
Type following commands

>> t=0:0.01:50;
>> ft1=exp(-t/10);
>> ft2=exp(-t/10).*sin(t);
>> plot(t,ft1,t,ft2) % This is how we can combine two plots.
>> grid
--- ------ -------:-------:------- ------ -------:-------:------- -----

' ' ' '
' ' ' '
' ' ' '
' ' ' '
0.6 - --- ------ -------:-------~------ ------ --- ---~-------~------ -----
' ' ' '
' ' ' '
' ' ' '
0.4 -- ---·-- - -------:-------~------ ------ ------~-------~------ -----
' ' ' '
' ' ' '
' ' ' '
0.2 --- --f- --- ---- - '- ----:-------
' ------ --- ___ ,' _______ :-------
' -----
___ -- .. - ---
---- _..' ----- -- ____,'_______ ..' ______ ..' ____________ ...' _______ ..' ______ -----
-0.2 ' ' ' ' ' '
' ' ' ' ' '
' ' ' ' ' '
' ' ' ' ' '
' ' ' ' ' '
-0.4 ---- -f -----1-------:-------:-------f------ -------:-------:------- -----
' ' ' ' ' ' '
' ' ' ' ' '
' ' ' ' ' '
' ' ' ' ' '
-0.6 ------,-------·-------.. ------ ------ -------.-------.. ------ -----
'
'
'
'
'
'
..'
'
'
'
'
'
.
'
' .'
' .'
'
'
.
' .
'
' .
'
'
-o.s ol__ _L__ o,----,Ls __2,.co_ _J25L - - ,3Lo_--,1
_,
, 35,---
4Lo __4.Ls- - -so
'
FIGURE E.11
Similarly, we can add several graphs in same window with the command
plot(x, f x, y, f y, z, f z, ..). Please see MATLAB help for more detail.
We can plot several other types of 2D plots in MATLAB with different

attributes such as colors, styles etc. here we are taking some more examples -
Function MATLAB Code
>> x = 0:0.1:10;
Stairs() >> y = exp(-x).*sin(x)
>> stairs(x,y)
>> x = 0:0.1:10;
Area() >> y = exp.*sin(x)
>> area(x,y)
>> x = -4*pi:pi/5:4*pi;
>> y =sin(x)./x;
Stem() >> y((length(y)-1)/2+1) =1;
>> stem(x,y)
>> x = -4*pi:pi/5:4*pi;
>> y = sin(x)./x;
Bar() >> y((length(y)-1)/2+1) =1;
>> bar(x,y)
>> x =0:0.1:10;
Semilogx() >> y =x.*exp(-x);
>> semilogx(x,y)
>> grid
>> x =0:0.1:10;
Semilogy >> y =x.*exp(-x);
>> semilogy(x,y)
>> grid
>> x =0:0.1:10;
Loglog() >> y =x.*exp(-x);
>> loglog(x,y)
>> grid
>> theta =0:pi/100:2*pi;
Polar() >> r = sqrt(abs(sin(4*theta)));
>> polar(theta, r)
Output graphs of these examples are shown in Figure E.12. There are several
other types of plots also, one can go through matlabs help.
Three dimensional (3D) plots: Let us have a look on 3D plots. Three
dimensional graphs, we can plot for functions of two variables such as z =
f (x, y). Here, we will plot x(t) = et/10 . sin(t) verses y(t) = et/10 . cos(t) along
with ‘t’ axis (see Figure E.13). Type the following commands
>> t=0:0.01:30;
>> x=exp(-0.1*t).*sin(t);
>> y=exp(-0.1*t).*cos(t);
>> plot3(x,y,t) % 3D plot command.
>> grid
We will get the following figure.
MATLAB has several specialized 3D plots. Just try this one-
04 04
0.1
80 100
S!airs funclion Aiea func!ion 90 1
05
-10 ·5 0 5 10 15
S!em func!ion Barfunc!ion
170
Polar func!ion
10° "''''''"''"''"'''""'""''•'''""'""'"
:::::::!:: :: ::::::::: ::;:::: :::~: :: ::::; : ::::::
10"
1
,;:; ;:::;";;::!:;::;;;~;;;::;;~ :;:::;;::;;;;;;
:::::::i:::::::i:.·=:::x:::::::c:::::::t:::::::
·······l·······l····
....... ,.......,..... ···:········(·······f·······
.,. .......•.......,.......
1o' ,.,,..i,,.,.,:,.,,.,.,;,,,,.,.\,.,.,.,\,.,.,.,
::::: ::!::::::::::::::::: ::: · : ::~ ::: ::::!:: :::::
....... , ....... , ....... y···· ., ....... , ..... . .
.......,.......,....................... ,.......
................................................
.... .. ....................................... . . . 3
,~, ......;..,, ,,;'"'"'
~ ~
· ···:·· ·HHH:l ··· +· H·H m~··· ·:·-·H+:-:-:-: 10" '"";,... ,.,;......

10' ' ' ' '""' :::: :!::: ::::!::::::::::::: ::::~ : :: :::. :::::::
...............................................
~ ~ ~ i ·····;·······;..
' ' ....,. ...............
10' L__._____._--"~-'-~__J
' .
0 100 400 600 800 1000 1100

Loglog func!ion Semilogx funclion
Semilogy funclion
FIGURE E.12
...
... .. , . ~.
'•I • ~•
0
30
'· •
' '•
I .. .,".,._J
20 ... "·
' '
... . ...
I
~"
•'•
..
I"" ...... .,:
-... :'
'•
,•"!
10
0
0
...
'• ..
·0.5
·1 ·1
FIGURE E.13
>> [x,y]=meshgrid(-8:0.5:8);
>> r=sqrt(x.^2+y.^2) + eps;
>> z=sin(r)./r;
>> mesh(x,y,z)
This will give, following result-
One can try following plot commands on same function code and see the re-
sults.
>> surf(x,y,z)
>> contour(z)
>> surfc(x,y,z)
>> surf1(x,y,z)
>> meshz(x,y,z)
>> waterfall(z)
Please see help on each command in MATLAB help
(ii) MATLAB and MATHEMATICA for Differential equation Section 2.3.
Chapter 3
Plots for vector valued functions in 2D and 3D
0.5
-0.5
10
10
-10 ·10
FIGURE E.14
In various applications, we need to visualize the vector valued functions (vector

fields). MATLAB has several functions to visualize vector field in 2D and 3D
such as quiver, quiver3, stream, stream3, streamline, streamline3, streamslice,
streamtube, streamribbon, streamparticles, coneplot, divergence and curl etc.
2 2
Here we plot for the function z = xe−(x +y ) see.
First, we take quiver plot. Just write following code on command prompt
>> [x,y] = meshgrid(-2:0.1:2);
>> z = x.*exp(-x.^2 - y.^2);
>> [dx,dy] = gradient(z);
>> quiver (x,y,dx,dy)
We get following plot-
Now we use streamline function to plot a vector ui + vj where u = f (x, y)

and v = g(x, y) are given as-
u = x + y − x(x2 + y 2 ) and v = −x + y − y(x2 + y 2 )
This, we can achieve by writing this code on command prompt
>> [x,y] = meshgrid(-2:0.1:2);

>> u=x+y-x.(x.^2+y.^2);
>> v=-x+y-y.*(x.^2+y.^2);
1.5
/
' ' \ I /
'
' ' I
I
/ -.....
----....-......"
I / .,-
0.5
~
' .....' \ /
/ /
' I ? -----..., \ / ......

~-.... '
'
,. ___ /
' .....
....,.
---- ---/
..,.. ~__...__... I
'
- ,.
/ \
-0.5 ~
/ / I \ --...
/ ' \
'
' "
I
'
/
-1
''
~ /
-1.5
-2
·2 -1.5 -1 -0 5 0.5 1.5
>> x0=[-2 -2 -2 -2 -.5 -.5 .5 .5 2 2 2 2 -.1 -.1 .01 .01];

>> y0=[-2 -.5 .5 2 -2 2 -2 2 -2 -.5 .5 2 -.01 .01 -.01 .01];
>> streamline(x,y,u,v,x0,y0)
>> axis square
We get this result-
For other types of plots, kindly go through the matlab help.
Chapter 4
Here we present one very versatile program written in MATLAB to com-

pute Fourier series of the given function. This program can be used to compute
Fourier series of any function, just by doing small modification in main pro-
gram body.
The program code is given below to find Fourier series of the square func-
tion which
is defined as:
−1 −π ≤ x < 0
f (x) =
1 0<x≤π
The program code is given below-% Fourier Analysis of square functions for
first 15 harmonics.
t0=-pi; % initial time
t0\_ T=pi; % final time
. 1_5 -1 -0.5 0.5 1.5 2
mp=0; % mid point

T=t0\_ T-t0; % time period
syms t; % sym variable declaration
ft = -diff(t); % −1 part of function
ftt= diff(t); % 1 part of function
w0=2*pi/T; % frequency
n=1:15; % number of Harmonics
% computation of Trigonometric Fourier series Coefficients
a0=1/T*(int(ft,-pi,0)+int(ftt,0,pi));
an=2/T*(int(ft*cos(n*w0*t),-pi,0)+int(ftt*cos(n*w0*t),0,pi));
bn=2/T*(int(ft*sin(n*w0*t),-pi,0)+int(ftt*sin(n*w0*t),0,pi));
ann=an.*\cos(n*w0*t);
bnn=bn.*\sin(n*w0*t);
avg=double(a0); % converting sym variable to value
t=-pi:pi/100:pi;
suma=0; sumb=0;
for j=1:15 % taking 15 harmonics
\hspace{.5cm}sumb=sumb+bnn(j);
\hspace{.5cm}suma=suma+ann(j);
bnsum=eval(sumb);
ansum=eval(suma);
plot(t, avg+bnsum+ansum) % plot of truncated harmonics function
hold on
% plotting actual function
t1=-pi:pi/1000:mp;
plot(t1,-1,‘r’)
t2= mp:pi/1000:pi;
plot(t2,1,‘r’)
grid on
% formatting plot
x label(‘Time’)
y label(’Amplitude’)
title(‘Fourier approximation plot for 15 harmonics for
square function’)
legend(‘Fourier Approximation’,‘Actual Function’)
The result is shown below
Here is another program to plot harmonics of the ramp function which is de-
fined as:
0 −1 ≤ x ≤ 0
f (x) =
x 0≤x≤1
One can see the required modifications in this program as compared with pre-
vious one.
% Fourier Analysis of ramp function for first 5 harmonics.
t0=-1; % initial time
t0\_ T=1; % final time
mp=0; % mid point
T=t0\_ T-t0; % time period
syms t; % sym variable declaration
ft=diff(diff(t)); % zero part of function
ftt=t; % t part of function
w0=2*pi/T; % frequency
n=1:5; % number of Harmonics
% computation of Trigonometric Fourier Series Coefficients
a0=1/T*(int(ft,-1,0)+int(ftt,0,1));
an=2/T*(int(ft*cos(n*w0*t),-1,0)+int(ftt*cos(n*w0*t),0,1));
bn=2/T*(int(ft*sin(n*w0*t),-1,0)+int(ftt*sin(n*w0*t),0,1));
1.5 r.========:::;--,-- --,-- --,- --,-- ---,

--Fourier Approximation
- - Actual Function
0.5 -------- ~-------- ~-------- ~-------- --------•--------.1--------·

' ' '
0 ..... . I I I I
r········ r········ r········ --------' --------, ········'
I I
·0.5 """"""" I~ """""""" I~ """""""" I~ """""""

I I I
I """""""" II """""""" .II """""""" II """""""
I
I I I I
I I I I I I I
-1 ······ -+f.JrA~~~arr~+~-t ········:········ :········ :·······

'' '' ''
Time
FIGURE E.15: Fourier Approximation Plot for 15 Harmonics for Square Func-
tion
ann=an.*cos(n*w0*t);
bnn=bn.*sin(n*w0*t);
avg=double(a0); % converting sym variable to value
t=-1.5:0.010:1.5;
suma=0; sumb=0;
for j=1:5 % First 5 harmonics
sumb=sumb+bnn(j);
suma=suma+ann(j);
end
bnsum=eval(sumb);
ansum=eval(suma);
plot(t, avg+bnsum+ansum) % plot of truncated harmonics function
hold on
% plotting actual function
t1=-1:0.010:0;
plot(t1,0,‘r’)
t2=0:0.010:1;
plot(t2,t2,‘r’)
grid on
% formatting plot
x label(‘Time’)
y label(‘Amplitude’)
title(‘Fourier approximation plot for 5 harmonics for ramp
function’)
legend(‘Fourier Approximation’,‘Actual Function’)
1.2,-------,-r=======:;---,-------,
- - Fourier Approximation
--Actual Function
-----------------------------------------------------------------
' ' '
' ' '
' ' '
'' '' ''
' ' '
0.8 ----- --- - ----------- ~ -----------; -----------; -----
"' 0.6 ........ .

"0
··········· Ii ··········· iI ··········· iI · . ...... I
i ......... .
I I I I
::l
' '
' '
c. ' '
E ,' ,'
~ 0.4 ···········, ········· · -~·- ·· ···· · ·-~·-······
' ' '
' ' '
' ' '
' ' '
' ' '
0.2 ··········-~ ·········-~·-········-~·-·· ·····-~·-········-~ ........ .
I I I I I
I I I I I
I I I I I
I I I I I
I I I I I
0 ........... : ' ···········:···········:·· ..... .

' '
' '
' '
' '
' '
Time
FIGURE E.16: Fourier Approximation Plot for 5 Harmonics for Ramp Func-
tion
Chapter 5
Simulation of Heat Equation, Wave Equation and Laplace Equation by MAT-
LAB 404-423, Section 5-6.
Chapter 6
Algorithms in Section 6.9.1, 6.9.2, 6.9.3 and 6.9.5.
Chapter 9
For MATLAB programme of computation in this chapter we refer to [19].
Chapter 10 and Chapter 11

Following computer programmes have been used in these two chapters.
1. Program to Extract the Variables from the Raw Data file
clear all
close all
clc
STN = input(’Enter Station Name: ’,’s’);
R = input(’Enter Station Data File name with Extension: ’,’s’);
[ndata, headertext] = xlsread(R);
disp(headertext(1,:));
fid = fopen(‘test.txt’,’r+’);
g1 = fgetl(fid); g2 = fgetl(fid);
disp(sprintf(‘\n \n %s \n \n %s \n’, g1, g2));
pos=ftell(fid); fclose(fid);
a1 = input (‘Please Input the Column Representing Year: ’);
a2 = input (‘Please Input the Column Representing Month: ’);
a3 = input (‘Please Input the Column Representing Day: ’);
a=input(‘Enter the Date of first reading [DD-MM-YYYY]: ’,‘s’);
b=input(‘Enter the Date of last reading [DD-MM-YYYY]: ’, ‘s’);
c=datenum(b)-datenum(a);
d=datenum(a)+31+28;
datestr(d+365);
a4=datenum(d)-datenum(a);
N = input(‘Enter The Number of Variables of Interest: ’);
for i=1:1:N
disp(sprintf(‘\n Pass the Column Number of Variable %d: \n’, i))
B(1,i) = input(‘Column Number: ’);
if (i==N)
fid = fopen(‘test.txt’,‘r’);
fseek(fid, pos, ‘bof’); g1 = fgetl(fid);
disp(sprintf(‘%s \n ’,g1));
disp(‘THE VARIABLES YOU ENTERED ARE AS FOLLOWS:-’)
disp(headertext(1 , B(1,:)));
fid = fopen(‘test.txt’,‘r’);
fseek(fid, pos, ‘bof’); g1 = fgetl(fid);
disp(sprintf(’%s \n ’,g1));
disp(‘If the Variables are correct, Enter 1, else Enter 2’)
tt= input(‘ENTER YOUR CHOICE: ’);
if (tt==1)
C(:,i)= ndata(:,B(1,i));
elseif (tt==2)
fid = fopen(‘test.txt’,’r’);
fseek(fid, pos, ‘bof’); fgetl(fid)
Disp (‘Please Choose the Correct Column Numbers’);
else
error(‘INPUT CAN BE 1 OR 2 ONLY’);
end
end
end
x1=[]; y1=[];
for k = 1:length(B)
x1=[x1 seasonal(a4, ndata(:,B(1,k)), k)];
end
2. Program to Extract Seasonal Data

function [x , y] = seasonal(a4, lmd, t)
S=[];
for i = 1:1:length(lmd)/365
if (i==3||i==7||i==11||i==15)
S=[S lmd(a4+365*(i-1)+2:a4+365*(i-1)+215 )];
else
S=[S lmd(a4+365*(i-1)+1:a4+365*(i-1)+214 )];
end
end
x = [];
for i =1:2:length(lmd)/365
x =[x
S(:,i)
S(:,i+1)];
if (i==15)
break
end
end
WA = []; WB = [];
for i = 1:1:length(lmd)/365
WA = [WA lmd(365*(i-1)+1:a4+365*(i-1))];
WB = [WB lmd(365*(i-1)+a4+215:365*i )];
end
y = [];
for i =1:2:length(lmd)/365
y =[y
WA(:,i)
WB(:,i)
WA(:,i+1)
WB(:,i+1)];
if (i==15)
break
end
end
3. Program for Wavelet Spectrum Calculation

clear all
close all
clc
STN1 = input(‘Enter the 1st Location: ’,‘s’);
STN2 = input(‘Enter the 2nd Location: ’,‘s’);
load (STN1)
load (STN2)
fx1=_speed(:,1);
fx2=_speed(:,1);
a=100:100:20000;
b1=1:50:8000;
b2=1:50:8000;
x=1:1:length(fx1);
x=x’;
for k = 1:1:length(a)
for i =1:1:length(b1)
c1=b1(i).*ones(size(x));
y1=abs(x-c1);
y1=(y1.*y1)./(a(k).^2);
y2=abs(x-c2);
y2=(y2.*y2)./(a(k).^2);
z1=exp(-y1./2)/(a(k)*(2*pi)^0.5);
z1=(2-y1).*z1;
z2=exp(-y2./2)/(a(k)*(2*pi)^0.5);
z2=(2-y2).*z2;
phi1=conj(z1);
phi2=conj(z2);
w1(k,:)=fx1.*phi1;
w2(k,:)=fx2.*phi2;
m1(k,:)=abs(w1(k,:)).^2;
m2(k,:)=abs(w2(k,:)).^2;
end
n1(k)=sum(m1(k,:))/a(k);
n2(k)=sum(m2(k,:))/a(k);
a(k)
end
figure; plot(a,n1,‘k:’,‘LineWidth’,2);hold on;
plot(a,n2,‘r-’,’LineWidth’,2);
xlabel(‘Scale’); ylabel(‘Wavelet Spectrum’);
set(gca, ‘XTick’, 0:5000:20000);
set(gca, ‘XTickLabel’, 0:5000:20000);
legend(STN1,STN2);
4. Program for Wavelet Correlation Calculation

clear all
close all
clc
STN1 = input(‘Enter the 1st Location: ’,‘s’);

STN2 = input(‘Enter the 2nd Location: ’,‘s’);
STN3 = input(‘Enter the Variable: ’,‘s’);
load (STN1)
load (STN2)
fx1=riyadh_pr(1:5800,1);
fx2=guriat_pr(1:5800,1);
x=1:1:length(fx1);
x=x’;
w1=[];tmp2=[];tmp1=[];
a=1:100:20000;
b1=-750:10:750;
for k = 1:1:length(a)
for i =1:1:length(b1)
y1=abs(x-c1);
y1=(y1.*y1)./(a(k).^2);
z1=exp(-y1./2)/(a(k)*(2*pi)^0.5);
z1=(2-y1).*z1;
phi1=conj(z1);
w1(k,i)=phi1’*fx1;
w2(k,i)=phi1’*fx2;
tmp1=(w1.^2)’;
tmp2=(w2.^2)’;
m1=sum(tmp1)/a(k);
m2=sum(tmp2)/a(k);
end
a(k)
end
p1=(sum(w1*conj(w2)’));
p2=(m1.*m2).^0.5;
r=p1./p2;
figure; plot(a,r,‘:r’,‘LineWidth’,2); xlabel(‘Scale’);
ylabel(‘Wavelet Correlation Coefficient’);
set(gca, ‘XTick’, 0:5000:20000);
set(gca, ‘XTickLabel’, 0:5000:20000);
legend(STN3);
5. Program for Pointwise holder exponent using DWT.

clear all
close all
clc
STN = input(‘Enter the Station Name: ‘,‘s’);
STNFILE = [STN,‘_1dexp_nonparametric’]
load (STNFILE)
a=0:1000:6000;
SUBPLOT(3,1,1), plot(__humid,’-b’);
Title(‘Daily Humidity (1990-2005)’,‘FontSize’,8);
SUBPLOT(3,1,2), plot(__humid_HDt0,’-r’);hold on;
plot(__humid_HDt1,‘-b’);
legend (‘db2’,‘db20’);
Title(‘DWT-based pointwise Holder exponent’,‘FontSize’,8);
SUBPLOT(3,1,3), plot(__humid_HDt2,‘-r’); hold on;
plot(__humid_HDt3,’-b’);
legend (‘coef6’,‘coef24’);
Title(‘DWT-based pointwise Holder exponent’, ‘FontSize’,8);
SUBPLOT(3,1,1), plot(__pr,‘-b’);
Title(’‘Daily Pressure (1990-2005)’,‘FontSize’,8);
SUBPLOT(3,1,2), plot(__pr_HDt0,‘-r’);
hold on; plot(__pr_HDt1,‘-b’);
legend (’db2’,’db20’);
SUBPLOT(3,1,3), plot(__pr_HDt2,‘-r’);
hold on; plot(__pr_HDt3,‘-b’);
SUBPLOT(3,1,1), plot(__rain,‘-b’);
Title(‘Daily Rain (1990-2005)’,‘FontSize’,8);
SUBPLOT(3,1,2), plot(__rain_HDt0,‘-r’);
hold on; plot(__rain_HDt1,‘-b’);
SUBPLOT(3,1,3), plot(__rain_HDt2,‘-r’);
hold on; plot(__rain_HDt3,‘-b’);
SUBPLOT(3,1,1), plot(__speed,‘-b’);
Title(‘Daily Speed (1990-2005)’,‘FontSize’,8);
SUBPLOT(3,1,2), plot(__speed_HDt0,‘-r’);
hold on; plot(__speed_HDt1,‘-b’);
SUBPLOT(3,1,3), plot(__speed_HDt2,‘-r’);
hold on; plot(__speed_HDt3,‘-b’);
SUBPLOT(3,1,1), plot(__temp,‘-b’);
Title(‘Daily Temperature (1990-2005)’,‘FontSize’,8);”
SUBPLOT(3,1,2), plot(__temp_HDt0,‘-r’);
hold on; plot(__temp_HDt1,‘-b’);
SUBPLOT(3,1,3), plot(__temp_HDt2,‘-r’);
hold on; plot(__temp_HDt3,‘-b’);
Index
acceleration, 308 complementary function, 157

analytic part, 560 exact, 148
ANFIS, 697 general solution, 156
angular momentum, 309 homogeneous, 153
annihilator operator, 166 linear, 119
Archimedes principle, 321 non linear, 119
auto-correlation, 661 non-homogeneous, 153
order, 118
backward difference approximation, ordinary, 117
497 partial, 118
Bessel’s equation, 232 differential operator, 154
boundary and initial conditions, 392 linear, 154
boundary value problem, 123, 152 direct problem, 600
direction cosines, 7
central difference approximation, 497 Dirichlet conditions, 392
characteristic equation, 48 divergence theorem, 297
complex number, 525 domain, 47
geometrical representation, 526 multiply connected, 549
polar form, 528 simply connected, 549
sequence, 536
series, 536 eigenvalue
compression sensing, 715 inverse problem, 603
confromal mapping, 576 error, 64
conjugate harmonic function, 545 mean absolute, 64
contour, 546 mean square, 64
convolution, 374 mean square signal-to-noise
correlations, 660 ratio, 64
Crank-Nicolson method, 509 root mean square, 64
critically damped, 130 Euler’s method, 499
cross-correlations, 660
current, 33 finite difference method, 495, 506
forward difference approximation,
determinants 496
Cramer rule, 40 Fourier series, 396
properties, 37 Fourier transform, 397
differential equation, 117 cosine, 378
particular solution, 156 discrete, 376
823
824 Index
fast, 377 integration

inverse, 371 line integral
sine, 378 scalar field, 283
fourier transform, 367 vector field, 282, 285
fractal, 656 vector fields, 282
chaos, 671 inverse problem, 600
wavelets, 676 causation problem, 605
fractal image processing, 665 draining, 603
fractional Brownian motion, 658 eigenvalue, 603, 607
free vibrations, 130 eigenvector, 607
frequency spectrum, 364 finance, 611
Frobenius method, 229 hanging cable, 604
function, 155, 537 heat equation, 612
analytic, 539 identification, 602
complex function, 537 matrix equation, 605
continuous, 538 option pricing, 617
derivative, 538 projectile motion, 601
domain, 537 scattering, 602
gradient, 272 Torricelli’s law, 601
harmonic, 544 trajectories, 604
linear combination, 344 wave equation, 614
linearly dependent, 155, 344 isotherms, 585
linearly independent, 155, 344
norm, 339 Konigsberg bridge problem, 703
orthogonal, 339
piecewise continuous, 339 Laplace equation, 404
range, 537 Laplace transform, 190
scalar product, 339 laurent expansion, 559
vector valued, 265 Legendre polynomial, 236
fuzzy, 692 Legendre’s equation, 234
Legendre-Galerkin method, 516
graph theory, 700 linear fractional transform, 581
Green-Ostrogradski theorem, 294 Lotka-Volterra predator-prey model,
135
Hankel transform, 378
Hausdorff measure, 657 Möbius transform, 581
Hausdorff metric, 668 Maclaurin series, 224
heat equation, 391 Markov chain, 71
heat potential, 585 Markov Process, 71
Hurst exponent, 660 martix
non-singular, 44
indicial equation, 230 mathematical model, 126
initial conditions, 392 matrix, 11
initial value problem, 123, 151 inverse, 44
integral transforms, 377 adjoint, 44
integrating factor, 146 augmented matrix, 21
Index 825
determinants, 36 absolutely convergent, 223

diagonal, 22, 25 convergent, 223
echelon form, 25 divergent, 223
identity, 22 interval of convergence, 223
invertible, 24 radius of convergence, 223
invertible matrix, 44
lower triangular, 23 range, 47
Markov, 70 residue Theorem, 558
multiplication, 18 Robin Mixed conditions, 392
non invertible, 24 Runge-Kutta method, 500
non-singular, 24
orthogonal, 25 scaling, 370
probability, 70 scalogram, 679
rank, 22 Schwarz inequality, 258
regular, 71 Separable variables, 143
row stochastic, 70 separable variables, 126
scalar, 22 series
singular, 24 Fourier, 345, 346
skew symmetric, 23 complex form, 359
square, 12 geometric, 536
sum, 14 Laurent, 558
symmetric, 23 partial sum , 536
transition, 70 trigonometric, 345
transpose, 17, 20 Shannon Sampling Theorem, 382
upper triangular, 23 simulation
zero, 22 heat equation, 414
momentum, 308 Laplace equation, 424
multifractals, 656 wave equation, 421
multistep method, 502 singularity, 559
Adams method, 502 singularity spectrum, 657
Adams-Bashforth method, 503 solution, 120
Adams-Moulton method, 503 explicit, 120
particular, 120
Neumann conditions, 392 singular, 120
neural network, 684 spectral method, 514
neuro-fuzzy, 692 spectrum, 657
spline interpolation, 712
open set, 272 Stokes theorem, 302
over-damping, 130 Sturm-Liouville problem, 378
surface integral, 290, 317
Partition function, 658 system
path, 546 orthogonal, 340
phase angle, 364 orthonormal, 340
Poisson equation, 413, 504
potential function, 288 Taylor series, 224
power series, 223 Taylor’s formula, 496
826 Index
time series, 671

components, 674
translation, 370
triangle inequality, 254
uncertainty principle, 374
vector, 2, 252
components, 265
coplanar, 9
cross product, 5, 259
geometric interpretation, 260
directional derivative, 274
dot product, 5
eigenvalue, 47
eigenvector, 47, 48
linearly dependent, 9
linearly independent, 9
magnitude, 4
orthogonal, 7, 257
scalar product, 7, 256
geometric interpretation, 256
scalar triple product, 262
standard unit, 254
steady state, 72
unit, 254
unit coordinate, 254
vector field, 265
differentiation, 269
flux, 317
velocity, 308
wave equation, 390

wavelet, 627, 664
applications, 639
coefficients, 631
multiresolution analysis, 635
series, 632
transform, 633
wavelet correlation coefficient, 664
wavelet spectrum, 664

Modern Engineering mathematics-CRC Press (2018) PDF

Uploaded by

Copyright:

Available Formats

Modern Engineering mathematics-CRC Press (2018) PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modern Engineering mathematics-CRC Press (2018) PDF

Uploaded by

Copyright:

Available Formats

What is the book about based on the contents page?

What is the book about based on the contents page?

What topics are covered in the index?

What topics are covered in the index?

Modern Engineering Mathematics

Modern Engineering Mathematics

Abul Hasan Siddiqi

© 2018 by Taylor & Francis Group, LLC

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-1205-7 (Hardback)

1 Matrices for Engineers 1

1.1 Vectors in R2 and R3 . . . . . . . . . . . . . . . . . . . . . 2

1.7.3 Cryptography with Matrices . . . . . . . . . . . . . 65

2 Differential Equations 115

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

2.5.5 Homogeneous Linear Equations with Constant

3 Vector Calculus 251

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

3.4.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . 282

4 Fourier Methods and Integral Transforms 337

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

5 Applied Partial Differential Equations 389

5.1 Introduction to Partial Differential Equations . . . . . . . . 389

5.3.1 Solution Using Fourier Series . . . . . . . . . . . . . 400

6 Algorithmic Optimization 445

6.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 446

6.9.2 Arnoldi’s Method for Linear Systems . . . . . . . . 478

7 Computational Numerical Methods in Engineering 495

7.1 Introduction to Numerical Differentiation . . . . . . . . . . 495

8 Complex Analysis 523

8.1 Motivation and Historical Development for Complex Analysis 523

8.5.1 Evaluation of Integrals Involving Trigonometric

9 Inverse Problems 599

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

9.6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . 614

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 627

11 Miscellaneous Topics Used for Engineering Problems 655

11.1 Fractals in Engineering Science . . . . . . . . . . . . . . . . 656

11.5 Software for Time Series, Neural Network, Neuro-fuzzy

Appendix A Basic Concept of Calculus 733

A.1 Number System . . . . . . . . . . . . . . . . . . . . . . . . . 733

Appendix B Summary of Properties of Matrices 749

B.1 Properties of Matrix . . . . . . . . . . . . . . . . . . . . . . 749

Appendix C Proof of Selected Theorems 755

C.1 Fundamental Theorem of Calculus . . . . . . . . . . . . . . 755

Appendix D Basic Concepts in Medical Imaging and Oil

D.1 Fundamental Steps in Digital Image Processing . . . . . . . 771

Appendix E Solution of Odd Number Exercises 775

E.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775

Computer Programs Used 803

This book entitled Modern Engineering Mathematics is a compendium of fun-

Radon transform, fractals, compression sensing and besides well non-

Basic results of complex analysis essential for understanding and solving

The Mathworks, Inc.

1.1 Vectors in R2 and R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1 Vectors in R2 and R3

FIGURE 1.1: Equal Vectors

FIGURE 1.2: Parallel Vectors