Scipy XX

Download as pdf
Download as pdf
You are on page 1of 28
(0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python Using optimization routines from scipy and statsmodels In (11: import scipy.linalg as 1a For root finding, we generally need to proivde a starting point in the vicinity of the root. For ID root finding, this is often provided as a bracket (a, b) where a and b have opposite signs. Univariate roots and fixed points In (21: def F009. return xt¥3-38x+1 In (31: x = np. lanspace(-3,3, 190) plt.axhline(o, e="red”) pit.plot(x, FO): 6 0 In (41: fron scipy.cptinize import brentq, newton brentg is the recommended method In [5]: brentq(f, -3, 0), brenta(t, 0, 1), brentacf, 1,3) out {Ss}: (-1.,8793852415718166, 0.3472963553337031, 1,532088886237956) Secant method In [5]: newtoncf, -3), newton(f, 0), rewton(?, 3) out(6]> (-1.e7oaeszars7iaies, 0,24729635523385205, 1.5320828062379572) Newton-Raphson method In (7]> fprine = larbda x: 3%x#*2 = 3 rewton(f, -3, forime), newton(t, 0, fprine), newton(t, 3, fprime) Outi}: (-1.8793852418718166, 0.34729635533386066, 1 .532088886737956) tps fIpeople.duke.edul~cce' /eta-663-20°6/'3_ Optimization htm 1028 (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python Analytical solution using sympy to find roots of a polynomial 1m {8}: fron syrgy Snport synbols, W, real_roots 1h (91: x = symbols('x', real=Truey sol = real_roots(x*3 - 3% + 1) List(map(N, s01)) oue(9}) (-1,87938524157182, 0,347296355353861, 1,53208888623796)] Finding fixed points Finding the fixed points of a function g(z) = a is the same as finding the roots of g(zx) — =. However, specialized algorintms also exist- e.g. using scipy. optimize. fixedpoint In (101: fron scipy.cptinize inport fixed_point “Pp In (11): x = np. lanspace(-3,3,100) 4 [J > pit.plotex, toa, colo pit.plet(x, x) ass. 2 6 0 5 ° 4 ~10 “15 20 red") In [12]: fixed_point(f, 0), fixed point(f, -3), fixed_point(t, 3) “Pp outi2}: (0.2541016883650524, -2.114907541476756, 1.8608058531117035) “Pp In (13): newton(lanoda x: f(x) - x, 0), nenton(lanbda x: 1) ~ x, -3), nenton(lanbds x: £0) - x, 3) “pb uri}: (@.25¢i0168asss0524, -2.114907541476814, 1.8608058531117062) “Pp In [14]: def f(x, 1) “pb "biserete 1 return r#x*(1-x) Stic equation In {15}: n= 100 4 [D> fps = np.zeroscny for i, r in enumerate(np-Linspace(0, 4, n)) fpsLi} = Fixed_point(f, 0.5, args-(r, )) plt.plotinp.Linspace(o, @, n), fps) pLt.axis((0,4,-0.1, 1.11) pass hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm (0770572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 10 os 08 os 02 00 00 05 10 15 20 25 30 35 40 Note that we don't know anything about the stability of the fixed point Beyond r In (161: xs = 01 4 [] > for i, rin enunerace(np-Linspace(o, 4, 400)) x=0.5 for jin range(10000) x= 1,7) fer j in range(so) k= fo.) xs append((r, 0) xs = np.array(xs) plt.seatter(xs{:,0), xsf2,1] plt.axis((0,4,-0.1, 1.11) pass 10 oe os VE oa e 02 oo 00 05 10 15 20 25 30 35 40 Mutlivariate roots and fixed points Use root to solve polynomial eugations. Use fsolve for non-polynomial equations. In (17: fron scipy.cptimize import root, fsolve “pp Suppose we want to solve a sysetm of m equations with n unknowns F(#o, 21) = 21 ~ 3a9(a0 + 1)(o ~ 1) ‘9(20,21) = 0.2523 +2? —1 In (18): def £09

return betta ~ 3¢xt07*cxt01+19*K001-19, 25exfo}2 + xf1]*2 - 1) In (19): sol a (p> sox otf, (0.5, 0.5)) hitps/Ipeople.duke.edul~cce' 4/eta-663-20°6/'3_ Optimization htm 3, the fixed point is unstable, even chaotic, but we would never know that just from the plot above. (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python outti9}: array(t 1.11694147, 0.829524221) yy In (201: fsolve(f, (0.5, 0.5)) “pb ur{20}: array({ 1.11694147, 0.829524221) ap We can also give the jacobian In (211: def jects)

return c-6*xt01, 17, c0.5#xto1, 2*x01101 In [22]: sol = root(f, (0.5, 0.5), Jac=jac) “> sol.x, sol. fun our(22}: (array({ 1.11696147, 0.829524221), [> array(t -4.233835500-12, -3.31612515e-121)) Check that values found are really roots In {23}: np.allclose(f(sol.x), 0) “bp out{23}: True “Pp Starting from other initial conditions, different roots may be found In [24]: sol = root(f, (1 4 ff> sex 129) out(24): array({ 0.778013%4, -0.92123498)) “p In [25]: np.allelose(f(sol.x), 0) «bp ut{25}: True «bp \We will assume that our optimization problem is to minimize some univariate or multivariate function f (2). This is without loss of generality, since to find the maximum, we can simply minime — f(r). We will also assume that we are dealing with multivariate or real-valued smooth functions - non-smooth or discrete functions (eg. integer-valued) are outside the scope. ofthis course To find the minimum of a function, we first need to be able to express the function as a mathemtical exoresssion. For example, in lesst squares regression, the function that we are optimizing is of the form ys — f(2i,0) for some parameter(s) 8. To choose an appropirate optimization algorihtm, we should at least answr these two questions if possible: 1. Is the function convex? 2. Are there any constraints that the solution must meet? Finally, we need to realize that optimization mehthods are nearly always designed to find local optima. For convex problems, there is only one minimum and so this is not a problem. However, if there are multiple local minima, often heuristics such as multiple random starts must be adopted to find a “good” enouhg solution. Is the function convex? hitps/Ipeople.duke.edul~cce' 4/eta-663-20°6/'3_ Optimization htm (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python Convex functions are very nice because they have a single global minimum, and there are very efficient algorithms for solving large convex systems. Intuitively, a function is convex if every chord joining two points on the function lies above the function. More formally, a function is convex if : nowrap Sta + (1 #6) < tf(a) + (1-1) (6) forsome : math : 't'betweenand! — thisisshowninthe figurebelow. Tn (26): def f(x)

return (x-4yt2 + x 1 with ple-xked(): X= mp.linspace(0, 10, 100) plt-plot(x, £00) ymin, yeax = plt.yLin() plt-axvline(2, ymin, (2)/ymax, €*'red') ple-axvline(é, ymin, (8)/ymax, ¢*'red) plt.scatcer((é, 4], (04), 12) + ((4-2)/(8-2.))*F(@)-FQ))1» edgecolor=['blue’, ‘red’], facecelor='none’, 5 ple plot(t2.e], f(2). £(8)1) ple.xcieks({2,4,8], (2°, “ta + (1-t)b", *b'), fontsize=20) plt.text(o.2, 40, 'f(ca + (5-49) < ta) * (1-t)FCb)', fontsize=20) plex1ia( 0,101) ple yeieks((1) plt-suptitle( ‘Convex function’, fontsize=20) Convex function f(ta + (1-t)b) < t#(a) + (1-4)#(6) a tas (1-t)b b ‘Warm up exercise Show that f(z) = —log(2) is a convex function. checking if a function is convex using the Hessian The formal defnition is only useful for checking if @ function is convex if you can find a counter-example. More practically, a twice differentiable function is convex if its Hessian is positive semi-definite, and strictly convex ifthe Hessian is postive Gefinite. For example, suppose we want to minimize the scalar-valued function f4, 2,3) KINZ + 2202 4 3324 IKK 42K 1X3 Jn [27]: fron sympy import symbols, hessian, Function, init_printing, expand, Matrix, diff Py xy. 2 = symboisex y 2°) f= symbols¢', els-Function) init_printing() hitpsfIpeople.duke.edul~cce14/eta-663-20°60'3_ Optimization htm (0770572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python In (28): f= wea + 2Aytea + at2t2 + 2euty + Det «bir out28) ff > 2? + Qey + Qazt 2y? +32? In [29]: H = hessiancf, (x, y, 2)) p> out(29) aff> [2 2 2 240 20 6, In [30]: np.real_if_close(a.eigvals(np.array(H).astype(’float"))) “pb ut {30}: array({ 0.24122952, 7.06417777, 4.69458271]) i Since the matrixis symmetric and all eigenvalues are positive, the Hessian is positive defintie and the function is convex. Combining convex functions ‘The following rules may be useful to determine if more complex functions are covex: 1. The intersection of convex functions is convex 2 Ifthe functions f and g are convex and a > 0 and b > Othen the function af + bgis convex. 3.If the function U is convex and the function g is nondecreasing and convex then the function f defined by J(2) = g(U(2)) is convex. Many more technical deetails about convexity and convex optimization can be found in this book. Are there any constraints that the solution must meet? In general, optimizaiton without constraints is easier to solve than optimization in the presence of constraints. In any case, the solutions may be very different in the prsence or absence of constraints, so itis important to know if there are any constraints, We will see some examples of two general strategies - convert a probime with constraints into one without constraints, or use an algorithm that can optimize with constraints. One of the most convenient libraries to use is. scipy.optimize , since its already part of the Anaconda installation and it hhas a fairly intuitive interface. In (311: fron scipy inport optimize 2s opt In (32): def 10 40D) return xeve + a4cx-2yea = 15%00"2 61 In [33]: x = np. linspace(-8, 5, 100) 4 ff > pit.pletix, fon: hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm (0770572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 400 200 -200 400 -600 800 ~1000 The ‘minimize scalar function will find the minimum, and can also be told to search within given bounds. By default, it uses the Brent algorithm, which combines a bracketing strategy with a parabolic approximation. In (34): opt.minimize_scalar(f, method='srent*) “Pp ducta4]: nit: 11 (p> pte: 12 x: -5.5288011252196627 fun: 203.39553088258865 In (35): opt.minimize_scalar(f, methed-"bounded’, bounds-[0, 61) Pp out(35): fev: 12 4 ff > success: True Fun: -$4.210039977127622 status: 0 message: “Solution found." x: 2, 6688651040396522 Local and global minima In (36): def f(x, offset) 4] > return -np.sine(x-offset) In (371: x = np.dinspace(-20, 20, 100) > pit-pletix, fox, 599: 02 00 02 04 06 8 0 "20 -15 -10 o 5 0 5 wm In {38}: # note how additional function arguments are passed in 4] > sol = ept.mininize scalarct, args-(5,)) sol outt3a}: nit: 10 apy rte 11 x: -1.4243871263953001 fun: -0.049029624014074166 hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 1128 (0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python In {39}: plt.plot(x, fox, 5)) 4 [J > plt.axviinecsol.x, c='red") ass 02 00 02 4 6 8 We can try multiple ranodm starts to find the global minimum. In [40]: lower = np.random-uniform(-20, 20, 100) “pb De bracket=(1, u)) for (2, u) in zipClower, upper] In [41]: 1dx = np.argnin¢{sol.fun for sol in sols}) «ff > sol = seistida In (42): plt.plet(x, fox, 5)) 4 [> plt.axvianecselx, red"); 02 00 02 4 6 8 Using a stochastic algorithm see documentation for the basinhopping, "__ algorithm, which also works with multivariate scalar optimization, Note that this is heuristic and not guaranteed to find a global minimum, In (431: fron scipy.cptinize import basinhopping ‘Pp 0-0 sol sol asinhopping(f, x0, stepsize-1, minimizer_kaargs-(args': (5,))) out(4s}: fev: 1812 a njev: 604 fun: -1.0 nit: 100 essage: (“requested nusher of basinhopping iterations completed successfully") x: array({ 5.1) minimization failures” 0 hitps/Ipeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm (0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python In (44): plt.plet(x, fx, 5) 4 [Pp pitaxvaine(sol.x, ¢='red') 02 00 02 4 6 48 Minimizing a multivariate function f : R" + R We will next move on to optimization of multivariate scalar functions, where the scalar may (say) be the norm of a vector. Minimizing a multivariable set of equations f : R" > R'ns is not well-defined, but we will Iter see how to soe the closely related problme of finding roots or fixed points of such a set of equations. We willuse the #Rosenbrock “banana” function to illustrate unconstrained multivariate optimization. In 20, this is fox, y)= bly x42)82 + (a-22 Land b = 100. The function has a global rinimum at (1,1) and the standard expression takes Conditioning of optimization problem With these values fr @ and , the problem is illconditioned. As we shall see, one of the factors affecting the ease of ‘optimization isthe condition number of the curvature (Hessian), When the condition number is high, the gradient may not point in the direction of the minimum, and simple gradient descent methods may be inefficient since they may be forced to take many sharp turns. In (45): fron sympy import symbols, hessian, Function, W

pltecontourcx, ¥, 2, mp.arange(10)*45, caap="jet') plt.text(1, 1, °x', vasvcenter”, has'center’, color Zooming in to the global minimum at (1) rea", fontsizs In {50}: x = np.inspace(o, 2, 100) 4[] > y= np.tinspaceco, 2, 100) x, Y= npomeshgrid(x, y) Z = rosen(np.vstack({X.ravel(), Y.ravel()1)).reshape( (700, 100)) In [51]: plt-contour(X, ¥, 2, [rosen(np-array({k, k1)) for k in np.Linspace(t, 1.5, 10)], emap='jet") a> plectexter, 1, 1%", vareenter’, ha='center’, color="red The gradient (or Jacobian) at a point indicates the direction of steepest ascent. Since we are looking for a minimum, one obvious possibility isto take a step in the opposite direction to the gradient. We weight the size of the step by a factor a known in the machine learning literature as the learning rate. If cx is small, he algorithm will eventually converge towards a local minimum, but it may take long time. If avis large, the algorithm may converge faster, but it may also overshoot and never find the minimum, Gradient descent is also known as a frst order method because it requires calculation of the first derivative at each iteration, hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 10128 (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python Some algorithms also determine the appropriate value of ax at each stage by using alline search, Le., = nowrap oo = aaginin flay — V f(es)) whichisal Doptimizationproblem. AAs suggested above, the problem is that the gradient may not point towards the global minimum especially when the condition number is large, and we are forced to use a small « for convergence, Let's warm up by minimizing a trivial function f(x,y) In (54): def tO 4 P> return xcoyee2 + xt132 def grad(x) Feturn np.array({2*x(0], 2°xC11)) 1 # learning race x0 = np.array({1.0,1.0}) print('start', x0) for i in rangecat) xO == a * grad(x0) if sx5 == 0 prints, x0) Start 1. 1) O08 Oa) 5 [0.262144 0.262144) To { 0.08589935 0,08589935) 15 [ 0.0281475 0.0281475} 20 { 0,00822337 0,00922337) 25 [ 0.00302231 0.00302231) 30 { 0.00089035 0.00099035) 35 [ 0.00032452 0.00032452) 40 [ 0.00010834 0. 00070634), Gradient descent for least squares minimization Usually, when we optimize, we are not just finding the minimum, but also want to know the parameters that give us the minimum, As a simple example, suppose we want to find parameters that minimize the least squares diference between a linear model and some data, Suppose we have some data (0,1), (1,2), (2,3), (3,3.5), (4,6), (5,9), (6,8) and want to find a line y = 8y + 612 that is the best least squares fit. One way to do this is to solve X7X8 = X7y, but we want to stow how this can be formulated as a gradient descent problem, We want to find — (fx) that minimize the squared differences r= \sum(beta_0 + \beta_t x- y)A2 We calculate the gradient with respect to 8 as and apply gradient descent. In (131: def fox, y, by “bp "Welper function return (b{0} + BE1}** - y) def grad(x, y, 6) “"Gradient of objective function with respect to parameters b. hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm ? + y* to illustrate the basic idea of gradient descent. 28 (0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python n= Leng) return np. array(t sun(fOx, ys BD), sunQxfOx, y, BD) In 101: x, y= map(np.array, ZpCCO,1). (142+ (23Ye (Bs3.5)e (46)e (549s (6489) In [16]: a= 4 [J > bo = np.zeros(2) for i in rangec10000) bo -= a * gradcx, y, bo) 101 # Learning rate In (17): plt.scatterGx, y, 5°30) 4 [] > plt-plet(x, B0(0) + bor1y**, colar ass: 0 red’) Gradient descent to minimize the Rosen function using scipy.optimize Because gradient descent is unreliable in practice, itis not part of the seipy optimize suite of functions, but we will write ‘a custom function below to illustrate how to use gradient descent while maintaining the scipy.optinize interface, In (52); def rosen_dercx) ‘Pp “-perivative of generalized Rosen function. *** ym = x[1:-1] wml = x0-21 xmpt = xf2:) der = np.zeros_Like(x) der{1:-1] = 200*Gxm-x_st¥*2) - 400*(mp1 - emt 2)%m ~ 2*¢1-xm) ero] = -<00%x(0]*(xC1]-xf0]**2) = 2*C1-x(0]) der[-1] = 200*¢xt-1]-x0-21"*2) return der Warning One of the most common causes of failure of optimization is because the gradient or Hessian function is specified incorrectly. You can check for this using check grad which compares the analytical gradient with one calculated using finite differences. In (53): fron scipy,cptinize import check grad pb for x in np.randos.uniferm-2,2,(10,2)) x0» np.array(x) Brint(xo, check grad(rosen, rosen_der, x0)) [ 0,84795961 0.geei431 } 4,120997621586-06 [ 0.25702089 1.12043717} 2.224913277260-06 { 1.06226702 -1.80970275} 3.12407324679¢-05 [ 1.5334192 -1.5059954] 1.13208298556e-05 hitps/Ipeople.duke.edul~cce' 4/eta-663-20°6/'3_ Optimization htm 1208 (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 0.193027 1.80868951] 3.71611303963e-06 1132643444 -1. 18624149] 3.25555261955e-05, 0.agtadeda -0 00341954] 2.6781137655e-06 0.855571391.12996825] 3.31235322885e-06, 100825153 -1.85587153] 1.62466187316e-05 o te t t t i [ 0.41514196 0.6850504 ] 1.76349327916.06, Writing a custom function for the scipy.optimize interface. In (551: def custmin(fun, x0, args-(), maxfevNone, alphad.0002, “pb rmaxiter "100000, tole‘e-10, callback-None, **options) ‘Implements simple gradient descent for the Rosen function."” bests = x0 besty = funcx0) Funealls = 1 niter = 0 improved = True stop = False while inproved and not stop and niter < maxiter: niver «= 1 # che next 2 Lines are gradient descent step = alpha * rosen_der(bestx) best = bestx - sep besty = funtbestx) funealls += 1 if 1a norm(step) < tol improved = False if callback is not None callback(bestx) Af maxfev is not None and funcalls >= maxfev: stop = True break return opt OptimizeResult(fun-besty, x-bestx, nit-niter, nfev-funcalls, success=(niter > 1)) In (56): def reporterp) 7 "Reporter function to capture internediate states of opt global ps Ps. append(p) In (571: # Initial starcing position 4 (> x0 = np.array((4,-4.11) In [58]: ps = (20) <[] > opt.minimizecrosen, x0, method-custmin, callback-reporter) outtse]: nit: 190000 (yy “success: True fev: 100001 x: array({ 0.998971 , 0.99979381)) fun: 1,06046634724483390-08 In [59]: x = np. Linspace(-5, 5, 100) y= mp.lanspace(-5, 5, 100) X, Y= npomeshgrid(x, y) 2 = rosen(np.vstack({X.ravel(), Y.ravel()])).reshape((100,100)) In (60): ps = mp.array(ps) 4] > pit tigurectigsize-(12,4)) plt.subplot(121) plt.contour(x, ¥, Z, ap-arange(t0)**5, exap-" jet") plt.plot(pst:, 0}, pst:, 11, '-ro") pit subplet¢22) plt.semilogy(range(len(ps)),. resen(ps-1)); hitps/Ipeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 1928 (0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 10° 108 10° 108 10° 10° 107 102 102 10% 108 108 107 70" 20000 40000 60000 80000 100000 Newton's method and variants Recall Newton's method for finding roots of a univariate function IKI) =k Mractf OHPOK) When we are looking for a minimum, we are looking for the roots of the derivative f"(2),s0 AK) = 2 k- act ORD Newotn’s method can also be seen as a Taylor series approximation fxn) 6) +h) roca} ‘tthe uncon minimum, the dea 0,50 BEAM =H) _ yay, hm FENN) = +h pre o= Fe) +4P"@) and letting Ax = 4, we get that the Newton stpe is, WDelta x =-\fractPooHP Oo ‘The multivariate analog replaces f" with the Jacobian and ff” with the Hessian, so the Newton step Is. \Delta HG 1H) \nabla fix) Second order methods Second order methods solve for H~* and so require calculation of the Hessian (either provided or approximated using finite differences). For efficiency reasons, the Hessian is not directly inverted, but solved for using a variety of methods such as conjugate gradient. An example ofa second order method in the optimize package is Newton-Gc In (61): fron scipy.cptimize import rasen, rosen_der, rosen_hess “pb In (62) ps = [x0 4 [> opt.minimize(rosen, x0, method New ‘osen_der, hess-rosen hess, callback-reporter) butte}: success: True apy nfev: 38 un: 1 ,36427827503542080-13 nit: 26 njev: 63 hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 14128 (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python phev: 26 status: 0 message: ‘Optinization terminated successfully.” x: array({ 0,99999963,0,99999926]) jac: array({ 1.2120835%e-08, -6.08502470e-05}) In (53): ps = mp.array(ps) “Up ple tagure(tigsiz plt.subplot¢121) plt.contour(x, ¥, 2, p.arange(t0)**5, emap=" jet") plt.plot(pst:, 01, psi, 11. ‘-ro") pit.subpler(122) plt.semilogy(range(len(ps)), rasen(ps.T)): 12,4)) 10° 4 10° 10 2 10° 7 10° 108 107 10° “4 10" 10 4 o 5 0 6 20 6 » Frist order methods ‘As calculating the Hessian is computationally expensive, first order methods only use the first derivatives. Quasi-Newton methods use functions of the first derivatives to approximate the inverse Hessian. A well know example of the Qua: Newoton class of algorithims is BFGS, named after the initials of the creators. As usual, the first derivatives can either be provided via the jac= argument or approximated by finite difference methods. Tn (64): ps = [x0] 4 [] > opt.minimizecrosen, x0, method= "8555", callback-reporter) ures): fev: 280 4 [> hess_inv: array(( 0.49982388, 0.99958969), [-0,99958968, 2.00405492}) fun: 1.20857997S84073916-11 nit 4a njev: 67 success: False status: 2 essage: ‘Desired error not necessarily achieved due to precision loss." x: array({ 0,99999658, 0,99999311]) jac: array({ 2.45198954e-05, -1.12454672e-05]) In (65): ps = np.array(ps) 4] > pit tigurectigsize-(12,4)) pit.supplet(121) plt.contour(x, ¥, Z, p-arange(t0)**5, exap-" jet") plt.plot(pst:. 0). pst:, 1], '-ro') plt.subplet¢122) plt.semilogy(range(len(ps)),rosen(ps.1)); hitps/Ipeople.duke.edul~cce' 4/eta-663-20°6/'3_ Optimization htm 18128 (0770572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 108 4 10 > 10° 10" ° 10° 10? ? 0? 5 10? 10" 0 5 0 5 0 0 3 0 45 Zeroth order methods Finally, there are some optimization algorithms not based on the Newton method, but on other heuristic search strategies that do not require any derivatives, only function evaluations. One well-known example is the Nelder-Mead simplex algorithm. In (66): ps = [x0] 4 [J > opt.minimize(rosen, x0, method='nelder-nead’, callback=reporter) outtes}: fev: 162 [o> success: True status: 0 Tun: 5.262756878429089¢-10 nit: as mnessage: ‘Optinization terminated successfully,” x: array([ 099998846, 0.99997494)) In (67): ps = mp.array(ps) a By pit tigurectigsiz plt.subplot¢121) plt.contour(x, ¥, 2, np.arange(t0)**5, ema plt.plot(pst:, 91, pst:, 11. '-ro") pit. subplet(122) plt.semilogy(range(len(ps)),rosen(ps.1)): 10° ‘ we 2 ° 4 K 10° ca o 2 4 on nn oD mo mH 10 10 10° 10? 10* 10* Lagrange multipliers and constrained optimization Recall why Lagrange multipliers are useful for constrained optimization - a stationary point must be where the constraint, surface g touches a level set of the function f (since the value of f does not change on a level set). At that point, f and g are parallel, and hence their gradients are also parallel (since the gradient is normal to the level set). So we want to solve Vi =—AVg or equivalently, VE+AVG hitps/Ipeople.duke.edul~cce' 4/eta-663-20°6/'3_ Optimization htm 16128 (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python Lagrange multipliers Numerical example of using Lagrange multipliers Maximize f(t, y, 2) = ay + yz subject to the constraints « + 2y = 6 and x ~ 32 = 0. We set up the equations F OGY. 2. W)= xy + yE—AO + 2y ~ 6) ~ px ~ 32) Now set partial derivatives to zero and solve the following set of equations y-A-p=0 z+z-2d yt+3n z+ 2y-6= 2-3: which is a linear equation in x,y, 2A, o1o0 7 10 1 -2 010 0 120 0 10-3 0 \pmatrix(x \\y W\ 2 \\ \lambda \\ \mu} = \pmatrix(0 Xk 0\\ 0 6 \\ 0} In {68}: A= np.array(t “Pp 10, 1,8, -1, -11, [1 0, 1, -2, 0), 0, 1, 0, 0, 3] [, 2, 0, 0, 03, 1, 0,-3, 9, 01) np.array( to 0,8,01) gol = la.solve(A, ») In {69}: sol ‘Pp putea}: array 3, 1.5, 12, 2, 0.81) “bp In (701: def F(x, y, 2)

return xty + ysz In (71): #@+s01f:31) “pb outt71] hitps/Ipeopte.duke.edul~cce1/eta-663-20°6/'3_ Optimization htm 0 3 0 0 1128 (0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python » 6.0 Check using scipy.optimize (721 > (73) 741 out 74) 4 4 Convert to minimization problem by negating function def £00) return -(xCo}*xE1] + x{1)*x(21) cons = (¢'type’t *eq"s fun’: Lanbéa x: np.array(txto] + 2x01] - 6, xf0] - 3**1211)») x0 = np.array({2,2,0.671) x = opt-minimize(t, x0, constraints-cons) nfevi 15 un: -5,9999999999999689 nits 3 njev: 3 success: True status: 0 hnessage: ‘Optisization terminated successfully.” ‘x: array({ 2.999997, 1.500001, 0.99999993]) jac: array((-1, 5000002, 319999997 | -1.s0000012, 0 » Another example of constrained optimization Many real-world optimization problems have constraints - for example, a set of parameters may have to sum to 1.0 (equality constraint), or some parameters may have to be non-negative (inequality constraint). Sometimes, the constraints ‘can be incorporated into the function to be minimized, for example, the non-negativity constraint p > 0 can be removed by substituting p = ef and optimizing for g. Using such workarounds, it may be possible to convert @ constrained ‘optimization problem into an unconstrained one, and use the methods discussed above to solve the problem. ‘Alternatively, we can use optimization methods that allow the specification of constraints directly in the problem statement as shown in this section. Internally, constraint violation penalties, barriers and Lagrange multipliers are some of, the methods used used to handle these constraints. We use the example provided in the Scipy tutorial to illustrate how to set constraints. : nowrap : Siz) = —Qay + 2x 2? — 2y?) subjecttotheconstraint : nowrap aay y (2 1/6 220 ‘andthebounds O5\ex\e1.5\\1.5Vey le 25 In (75): def £60

return -(2eetoyextt) + 26x00) - xtor*¥2 - 2¢x017"*2) In (96): x = np.lanspace(o, 3, 100) > y = np. lanspace(o, 3, 100) X, Y= npomeshgrid(x, y) Z = Fmp.vstack((X.ravel(), Y.ravel()]))-reshape((100,100)) plt.contour(X, ¥, Z, np.arange(-1-99,10, 1), emap= jet"): plt.plot(x, x3, 'k:", Linewidth plt.plot(x, (r-1)*4+2, °k:', Linewidt plt.fi11({0.5,0.5,1.5,1.5], (2.51.5,1.5,2.5], alpha-0.3) plt.axis((0,3,0,3]) hitps/Ipeople.duke.edul~cce' 4/eta-663-20°6/'3_ Optimization htm 18128 (0770572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python outi76): (0, 3, 0, 3) p 30 25 20 15 10 os oo ‘00 To set constraints, we passin a dictionary with keys type, fun and jac . Note thatthe inequality constraint assumes a Cyx > O form. As usual, the jae is optional and will be numerically estimate if not provided Im (771: cons = (<'type': eq", “pb fun’: Tanbéa x: np.array(txto)3 - xt11)), ‘jact + Lambda x: np.array((3.0°(x(0}**2.0), -1.0))>, ee Tae fun’ : Lanbéa x: np.array(txt1] - (xt0}-1)*4 - 219) bnds = ((0.5, 1.5), (1.5, 2.5)) In (78}: x0 = (0, 2.5] “Pp LUnconstrained optimization In (79); ux = opt.minimize(f, x0, constraints-None) dp « out{79}: fev: 20 4 |] > ess inv: arrayctt 0.99999996, 0.48999596), [0.49999996, 0.49999997)}) ‘tun: -1,9999859999999987 nit 3 njev: 5 success: True status: 0 essage! ‘Optimization tersinated successfully x: array(( 1.99999996, 0,99999996)) Jac: array(t 0., 0.1) Constrained optimization In (80): ex = opt.mininize(t, x0, bounds-brds, constraints-cons) «pi« outa0}: fev: 21 mi fun: 2,0499754720925521 nit: 5 njev: 5 success: True status: 0 message: ‘Optinization terminated successfully.” x: array({ 1,26089314, 2,00463288)) jac: array([-3.48747873, S.ag67aaz8, 0 D In (81); x = np-Linspace(o, 3, 100) 4 [> y= np-tinspaceco, 3, 100) np.meshgrid(x, y) Finp.vstack([K.ravel(), ¥.ravel()1))..reshape((100,190)) plt.contour(x, Ys Z, np.arange(-1.93,10, 1), emap=" jet"); hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 19128 (0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python plt.plot(x, x3, 'k:', Linewidt plt.plot(x, (x-1)*4:2, °k:', Linewidth=1) plt.text(uxfx'I10), uxt'x'T(11, ‘x', vas'center', ha plt.text(ex['x 10], ex{'x'](11, *x!, vas!center', has "center plt.fil1(t0.5,0. 1.51, (2.5,1.5,1.5,2.5], alpha-0.3) plt.axis(t0,3,0,31); center!, 128-20, color="blue!) | size-20, color="red’) 30 25 20 18 10 os * oo 00 05 10 15 20 25 30 Some applications of optimization Curve fitting Sometimes, we simply want to use non-linear least squares to fit a function to data, perhaps to estimate parameters for a ‘mechanistic or phenomenological model. The curve_fit function uses the quasi-Newton Levenberg-Marquadt algorithm to perform such fits. Behind the scenes, curve fit Is just a wrapper around the leastsq function that does nonlinear least squares fitting In (82): fron scipy,cptinize inport curve fit 7 In (83): def logisticé(x, a, by c, » "The Four paranter logistic function is often used to fit dose-response relationships. """ return ((a-4)/(1.0+((x/e)**9))) + In (84: nabs = 24 > xdata = np.linspace(0.5, 3.5, nabs) ptrue = (10, 3, 1.5, 12] ydata = logisticd(xdata, ‘ptrue) ~ 0.5#np. random. randon(nobs) In (851° popt, pcov = curve fit(logistict, xéata, yéata) > In [86]: perr = yerr=np.sart(np. diag(ocov)) 4 > prinec:Paras\errue\testim (+/+ 1 $09") for p, pt, po, pe in 7ip("adcd", ptrue, popt, perr) BrAnt(*Hs\tes. 2\EES.2F (4/-K5.27)" % (Py Pty PO, pe)? Param True Estim (+/- 1 Sb) 2 10.00 10.38 (+/- 0.08) > 3.00 4.08 (¥/- 0.77) < 4301-47 (4/- 0.07) ‘ 12,00 12.08 (+/- 0.08) In [87]: x = np. linspace(o, 4, 100) 4 [D> y= togisticatx, “popt) plt.plot(xdata, ydata, “o') plt.plot(x, y): hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 20128 (0770572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 2s 20 15 105 00 05 10 15 20 25 30 35 40 Finding paraemeters for ODE models This is a specialized application of curve fit, in which the curve to be fitted Is defined implicitly by an ordinary differential equation : nowrap : de & = ke dt andwewanttouseobserveddatatoestimatetheparameters : math and the initial value ap. OF course this can be explicitly solved but the same approach can be used to find multiple parameters for n-dimensional systems of ODEs. Amore elaborate example for fitting a system of ODEs to model the zombie apocalypse In (88): from scipy.integrate inport odeint “pb def fox, ts “Simple exponential decoy. return -k°x def xt, k, x0): Solution to the ODE x'(t) = f(t,x,k) mith initial condition x(0) = x0 x= odeint(f, x0, t, args=¢k,)) return x.ravel() In (89: # True paraneter values 4ff> = 10 k= 0.1*rp ph # Sone random data genererated from closed form soltuion plus Gaussian noise ts = np.sort(np.random.uniform(o, 19, 200) xs = x0 *rp.exp(-k"ts) + np.randem.narmal(0,0.7,200) pont, cov = curve fit(x, ts, x5) kept, x0_opt = popt k = 0.313008 xo = 9.28547 In {90}: import matplotlib,pyplot as plt 4 [J > © = np.tinspaceco, 10, 100) plt.plot(ts, x5, 'r.' ty xCts Kept, x0_opt), °=")5 hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 21128 (0770572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 0 Another example of fitting a system of ODEs using the Imfit package You may have 0 install the “Imfit <7http://cars9.uchicago.edu/software/python/Infivindex.htmnl>"_ package using pip andrrestart your kernel. The Imfit algorithm is another wrapper around scipy.optimize.leastsq but allows for richer model specification and more diagnostics. In [91]: fron Infit inport minimize, Parameters, Parameter, report_fit

import warnings In (92): def fxs, t, ps) yy "Lotka-Volterra predator-prey model.” try a © psl's"].value b © ps['b'].value value value x ya xs return [2% - btxty, etxty - dy] def g(t, 20, ps) Solution to the ODE x"(t) = 1(t.x.k) mith initial con Bere x= odeint(f, x0, t, args-(ps,)) return x def residual(ps, ts, data) x0 = ps{'x0').value, ps{'y0').value hiodel = g(ts, x0, ps) return (nedel - data).ravel() t= mp.Linspace(0, 10, 100) x0 = np.array(tt,11) ay bee d= att ‘rue_params = np.array((a, b, €, 4)) np. randon. seea(*23) data = g(t, x0, true_params) ata += np.random.normal(size-daca. shape) # set parameters ineluing bounds parans = Paraneters() parans.add('x0", value= fleat(datafo, 0), min=0, max=10) parans.add(’yo", value=float(éata[0, 1]), mi 10) parans.add(‘2', value-2.0, sin=0, max=10) parans.add(“b', values2.0, sin=0, max=10) parans.add(‘c', values2.0, sin=0, maxt0) parans.add(‘d', value=2.0, sin=0, max=10) hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 28 (0710572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 4 F2¢ model and find predicted values result = minimize(residual, parans, args-(t, data), method 'leastsq") Final = data + result.residual reshape(cata. shape) plt.plot(t, data, 'o') plt.plot(t, final, '-", Linewidth=2); # display fitted statistics report_fit(result) [Fie Statistics} # function evals = 169 # data points = 200 # variables 6 chi-square 212.716 reduced eni-square = 1.098 [tvariablest] x0; 1.02052198 4/- 0.181683 (17.80%) (init= 0) 0 yO: 1.07086508 +/- 0.110177 (10.29%) (init 1.997345) 3B) 31Sazaraee +/- 01454160 (12.82%) Cinit= 2) bs fzize06at +/- 01148475 (12.24%) Cinit= 2) 0184529664 +/- 0.079478 (9.40%) (ini &: 0185715536 +/- 01085627 (9.99%) (ins [{Correlations}} (unreported correlations are < 0,100) (a, b) cea. a) co, 4) (x0, 5) (x0, a) (0, ©) Gyo, a) cee.) coo, a) c@.'6) (yo, a) @, 6) (yo, 8) 1Users/ 21 # spring stiftness PO = np-random.uniform(o, 5, (n,2)) ‘A= np.ones¢(n, m)) Alnp.tril_indices_frem(A)) = 0 Ls Acopy( In (95): def energy(P) [> P= Pereshape((-1, 29) D = squareform(pdist(P)) return 0.5*(k * A * (D - Ly*#2).sum() In [96]: energy(P0,ravel()) ‘p> outi96) “)» 479.136508399 In [97]: # fax the position of the first few nodes just to show constraints (p> fixes = 4 bounds = (np.repeat (PO: fixed, *].ravel(), 2).reshape((-1,2)).tolist() + {tWlone, None]] * (2*(n-fixed))) boundst fixed? 24] out(a7}: ((1.191249528862059, 1.191249522562059] 4] > [4.0389554314507805, 4,03895543145078051, [4.a74ao14goag00s2,4.474¢91439430058) [0.216114460398236, 0.2167148603982341, [1.5097341813138952, 1.5097361213138952) , [4.902310392971438, 4.902910992371438} ([2.6975241127686767, 2.6975241127686767), [3:131Saseoesasze15, 3.1315«62085492215], Wore, Noned , [None, None] None, None], (None, None}) In {98}; sol = opt.minimize(energy, PO.ravel(), bounds-bounds) 7 Visualization Original placement is BLUE Optimized arrangement is RED. In (99): plt-seatter(Pof:, 0}, POr:, 11, 5°25) 4 |] > P= sol x.reshapec(-1,2)) plt.seatter(Pl:, 0], Pls, 1, edgecolorse ‘red’, facecolors= ‘none’, $930, Linewidth=2); hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 24128 (0770572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python 6 5 = 4 . mo 7 zt as r bo Z| ° s Q When we solve standard statistical problems, an optimization procedure similar to the ones discussed here is performed. For example, consider multivariate logistic regression - typically, a Newton-lke algorithm known as iteratively reweighted least squares (IRLS) is used to find the maximum Ikelihood estimate for the generalized linear model family. However, using one of the multivariate scalsr minimization methods shown above will also work, for example, the BFGS minimization algorithm, ‘The take home message is that there is nothing magic going on when Python or R fits a statistical model using a formula - all that is happening is that the objective function is set to be the negative of the log likelihood, and the minimum found using some first or second order optimization algorithm. In (100] import statsrodels.api as sm “bp Logistic regression as optimization ‘Suppose we have a binary outcome measure ¥ € 0, 1 that is conditinal on some input variable (vector) « € (—00, +00). Let the conditioan! probability be p(x) = P(Y = y|X ==). Given some data, one simple probability model is p(x) = Sy + 2-8 - ie. linear regression. This doesn't really work for the obvious reason that p(x) must be between 0 land 1 as z ranges across the real line, One simple way to fix this is to use the transformation g(x) = ce = By +2.8. Solving for p, we get nowrap : 1 Asyouallknowverywell, thisislogisticregression. Plz) Suppose we have n data points (2,4) where a Is a vector of features and yj is an observed class (0 or 1). For each event, we either have “success” (y = 1) or “failure” (¥ = 0), so the lkelinood looks like the product of Bernoulli random variables. According to the logistic model, the probability of success is p(2s) if ys = 1 and 1 — p(2y) if ys = 0. So the tkelihood is nowrap : L(60,8) = |] ple)" = le) andthelog — likelihoodis hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 25128 (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python (60,6) = So wstog(a) + (1 ~ 206 ~ ples) 4 \ - Plz) Dylest ple) + Yo wles yy == log1 + e088 + lf +58) Using the standard ‘trick, if we augment the matrix X with a column of 1s, we can write By +; » Bas just X8. In (1011. 6f_ = pd.read_esv("netp://amw.ats 4 [D> ot neaac ila, edu/stat/data/binary.csv") oe [admit gre [gpa [rank | (0 380/361) af 6603675 __s00|eooht 640 /3.1914 o_520,2.93)4 In [102] # We wild agnore the rank categorical value 7 cols to_keep = (odnit', ‘are’, *gpa'] et = df Lcols_te_keep] at.insert(1, “dairy, 1) af head() oes) [admit dummy gre|gpe 0. 380) 3.61 ah 6603.67 ma 800) 4.00 1a (6403.19 o . 520 2.93 Solving as a GLM with IRLS ‘This is very similar to what you would do in R, only using Python's statsmodels package. The GLM solver uses a special variant of Newton's method known as iteratively reweighted least squares (IRLS), which will be further desribed in the lecture on multivarite and constrained optimizaiton. tn (103) model n.GLM.from_formula('adnit ~ gre + gpa’, “bp data-df, fanily-sn. families Binosia}()) fit = model. Fite fit summary) ourt03) a Generalized Linear Model Regression Results Dep. Variable: admit No. Observations: 400 Model: cum OfResiduals: © 397 Model Family: Binomial 2 Link Function: logit 10 Method: RLS 240.17 Date: ‘Thu, 18 Feb 2016 Deviance: 480.34 Time: 15:23:26 Pearson chi2: 398. No. Iterations: 6 hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 26128 0770572022, 10:59 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python coef stderrz —_P>[2| [95.0% Conf. Int: Intercept -4.9494 1.075 -4.604 0.000 -7.057.2.842 Bre 0.0027 0.001 2.544 0.011 0.001 0.005 Bpe 0.7547 0.320 2.361 0.018 0.128 1.381 Solving as logistic model with bfgs Note that you can choose any of the scipy.optimize algotihms to fit the maximum likelihood model. This knows about higher order derivatives, so will be more accurate than homebrew version, In [104] model2 = sm.Logit.fromformula(‘adnit ~ is" % ‘+! join(df.colums(2:)), dat 4 [J > Fitz = modei2.tit(method='brgs", maxiter=100) #402. summary) optimization terminated successfully Current function value: 9.600430 Iterations: 23 Function evaluations: 65 Gradient evaluations: $4 outti0ay “bP Logit Regression Results Dep. Variable: admit No. Observations: 400 Model: Logit OfResiduals: 397 Method: MLE DF Model: 2 Date: Thu, 18 Feb 2016Pseudo R-squ: 0.03927 Time: 19:23:26 Logsikelihood: —-240.17 converged: True ULNutt 249.99 URpyalue ——5.456e-05, coef stderr 2 P>[z) [95.0% Conf. Int! Intercept -4.9494 1.075 -4.604 0.000 -7.057-2.842 gre 0.0027 0,001 2.544 0.011 0.001 0.005 gpe 0.7547 0.320 2.361 0.018 0.128 1.381 Home-brew logistic regression using a generic minimization function ‘This is to show that there is no magic going on - you can write the function to minimize directly from the log-likelihood equation and run a minimizer. It will be more accurate if you also provide the derivative (+/- the Hessian for second order methods), but using just the function and numerical approximations to the derivative will also work. As usual, this Is for illustration so you understand what is going on - when there isa library function available, youu should probably use that instead. In [105] def f(beta, y, x) 7 “minus log Likelihood function for logistic regression. “"* return -((-np-log(! + mp-exp(np.dov(x, beta))))-sum() + (y*¢np.dot(x, beta))-sum()) In (108) betao = np.zeros(2) «ff» opt.minimizecf, betao, args=(éfT'ainit'], of.1xf:, ‘dunmy':]), method="SEGS', options=¢'gtol!':1e-2)) out{105) nfev: 80 Gl] > hessinv: array(({ 1.15242936e+00, -2.77649429e-04, -2.81644777¢-01] 12, 776£94296-04, 1,16649220-06, -1,22069519¢-04), 22,816¢4777e-01, -1,22069519e-04, 1 .02628000e-01]]) fun: 240. 1719908949394 nit: @ njev: 16 hitps/Ipeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 2118 (0710572022, 10:53 Using optimization routines from scipy and slatsmodels — Computational Stasis in Python success: True status: 0 message: ‘Optinization tersinated successfully." x: array(( ~4,949335706+00, 2,690344016-03, _7.54734491¢-011) jac: array({ 9.155273¢4e-05, -1.96456909e-03, 4.5967102%e-04)) Optimization with sklearn There are also many optimization routines in the scikit-learn package, as you already know from the previous lectures. Many machine learning problems essentially boll down to the minimization of some appropriate loss function. Resources * AScipy Optimize refernce * aScipy Optimize tutorial ‘= ALMFit- a modeling interface for nonlinear least squares problems = 7CUXpy- a modeling interface for convex optimization problems * 7Quas-Newton methods # 7Convex optimization book by Boyd & Vandenberghe *# @Nocedal and Wright textbook In (1071 sdoad_ext version_information «|» wersion_informaticn scipy, statsnodels, sympy, Imfit 01071 Software Version {8 python 35:1 64biE[GCC4.2.1 (Apple In. buld 5577) IPython 4.0.1 os Darwin 15.3.0 x86_64 i386 64bit scipy 0.16.41 statsmodels 0.6.1 sympy 0.7.6.1 Imfic 092 Thu Feb 18 15:23:26 2016 EST hed hitpsfIpeople.duke.edul~cce14/eta-663-20°6/'3_ Optimization htm 20128

You might also like