A Mixed Integer Linear Programming Model For The Optimal Synthesis of Protein Purification Processes With Product Loss
A Mixed Integer Linear Programming Model For The Optimal Synthesis of Protein Purification Processes With Product Loss
A Mixed Integer Linear Programming Model For The Optimal Synthesis of Protein Purification Processes With Product Loss
27
A Mixed Integer Linear Programming Model for the Optimal Synthesis of Protein Purification Processes with Product Loss
E. Vasquez-Alvarez and Jose M. Pinto* Department of Chemical Engineering, University of Sao Paulo Av. Prof. Luciano Gualberto t. 3 n. 380, Sao Paulo, SP, 05508900 Brazil Department of Chemical Engineering and Chemistry, Polytechnic University Six Metrotech Center, Brooklyn, NY, 11201 USA
The objective of this work is to develop a mixed integer linear programming (MILP) model for the synthesis of protein purification processes that incorporates product losses. Mathematical models for each chromatographic technique rely on physicochemical data on the protein mixture, which contains the desired product and provide information on its potential purification. In previous works, MILP models assumed the complete recovery of the desired protein. The present model incorporates losses in the target protein along the purification process, in order to evaluate the trade-off between product by purity and quantity. A formulation that is based on a convex hull representation is proposed to calculate the minimum number of steps from a set of chromatographic techniques that must achieve a specified purity level as well as the amount of product recovered. Model linearity is achieved by assuming that the product is recovered in discrete percentage levels. The methodology is validated in examples with experimental data and results are shown to provide an important guideline for synthesizing purification processes. Keywords: Purification processes, chromatographic steps, convex hull representation, MILP.
Introduction
Many pharmaceutical products are proteins or polypeptides. These biotechnological products can be obtained from nature by extraction or produced by microorganisms that are genetically modified, namely recombinant proteins. In both cases, separation and purification of the desired protein are usually among the most difficult stages in the whole process, and such stages may account for up to 60 % of overall cost.1 In addition to protein recovery from bioreactions, protein purification includes a series of steps that aim at the removal of contaminant, thus reaching a pre specified purity level for the target protein. Ideally, protein purification would consist of a single step to extract 100 % of the pure product. In reality, several steps are needed and product purity may reach 99 %, usually in the range 9599 %.2 Depending on the degree of complexity of the mixtures that result from bioreactions, several recovery and purification operations may be necessary to isolate the desired product. The most important operations include chromatographic techniques
* Author to whom all correspondence should be addressed. E-mail: [email protected]
that are critical for therapeutic products such as vaccines and antibiotics, which require very high purity levels (98 99.9 %). One of the main challenges in the synthesis of downstream purification stages is the appropriate selection and sequencing of chromatographic steps.3 Therefore, optimization methods4 as well as expert systems5 are useful tools for the design and synthesis of protein purification processes. Steffens et al.6 developed a synthesis technique for generating optimal downstream processing flowsheets for biotechnological processes. The technique integrates the idea of screening units via physical property information into an implicit enumeration synthesis tool. Mathematical programming approaches for process synthesis rely on the representation of algebraic equations with discrete variables. In previous works, Vasquez-Alvarez et al.4 and Vasquez-Alvarez and Pinto7 developed mixed-integer linear optimization models that implicitly assume the complete recovery of the desired protein. The objective of this paper is to develop a mixed integer linear programming (MILP) model for the synthesis of protein purification processes that incorporates product loss, in order to evaluate the trade-off between product quality, given by purity, and quantity.
28
E. VASQUEZ-ALVAREZ and JOSE M. PINTO, A Mixed Integer Linear , Chem. Biochem. Eng. Q. 17 (1) 2734 (2003)
The structure of the paper is as follows. In the next section the problem and its trade-offs are described. Then the mathematical formulation is presented that is based on a convex-hull representation of linear disjunctions. Examples are proposed and solved, and a sensitivity analysis on the main parameters of the model is performed. Finally, the major conclusions of the work are discussed.
Problem Description
Consider a complex protein mixture that must be purified by chromatographic techniques. The degree of separation depends on the protein partition differential between the stationary and mobile phases. Information on physicochemical properties can be used for the target and contaminant proteins and each chromatographic technique is able to perform the separation of the mixture by exploiting a specific physicochemical property, such as surface charge as a function of pH, surface hydrophobicity, molecular weight etc. For instance, ion exchange chromatography separates proteins based on their difference in charge. The charge of a protein depends on pH according to the titration curve. Ion exchange can make use of small differences in charge that yield a very high resolution and hence it is an extremely efficient operation to separate proteins. Usually, several steps are necessary to purify a protein mixture. Among the several candidate techniques, high-resolution chromatography represents the most important group. Losses of target protein along the purification process are possible, and therefore we must evaluate the trade-off between product quality (given by purity) and quantity. In other words, the higher the purity achieved within each step, the smaller the product yield. In this sense, decisions involve the selection of techniques and their order as well as the percentage of product recovered.
F i g . 1 Problem superstructure
cern the selection of the technique (index i), its order in the sequence (index k), and the recovery level of the desired protein (index l). Note that in order to keep the optimization model linear, it is assumed that product must be recovered in discrete percentage levels.
Model of a chromatographic technique
Mathematical Model
Mixed-integer linear optimization models for the syntheses of purification bioprocesses were developed in previous work 4. In previous models, implicit was the assumption that product recovery was complete. Consequently, the major decisions concerned selection and ordering of chromatographic operations. In the present case, the model must account for target protein losses along the purification process. Figure 1 represents the model superstructure as well as the major decision involved. Therefore, the key decisions in the synthesis process con-
The modeling of chromatographic techniques is based on previous work.4 The approach for each chromatographic technique is to approximate chromatograms by isosceles triangles. Moreover, physicochemical property data (Pa,p) for the target protein as well as for the major contaminants are required. Physicochemical property data are used for the calculation of the dimensionless retention times (Kdi,p) for each chromatographic technique. Kdi,p = fi (Pa,p) " i, p, a Ai (1) Parameter DFi,p denotes the deviation of the target protein property value from the equivalent value of the contaminant. The representation of the deviation factor for each protein p for technique i is given in equation (2): DFi,p = |Kdi,dp Kdi,p| " i, p (2)
These coefficients and mathematical correlations (DFi,p and Kdi,p) were developed by Watanabe et al.2 and their validity was tested by Lienqueo.8
E. VASQUEZ-ALVAREZ and JOSE M. PINTO, A Mixed Integer Linear , Chem. Biochem. Eng. Q. 17 (1) 2734 (2003)
29
It is assumed that the peaks in chromatograms have constant shapes and that the one on the left refers to the product and the other to the contaminant protein. In Figure 2, the shaded areas represent product that is removed with the contaminants for a given discrete recovery level (given by the index l). On the other hand, the dark-shaded areas (with base Bi,p,l) represent the amount of contaminant p that remains in the mixture (with the product) after applying chromatographic technique i. It is important to note that five cases may arise, depending on the relative position of the triangles. The first case corresponds to an almost complete overlap between the triangles (Figure 2a); the other extreme occurs when both triangles are completely apart (Figure 2f); finally, the remaining cases are shown in Figures 2b, 2c, 2d and 2e. In these figures, the amount of lost product can be determined from the ratio of the light-shaded area and the total area of the product chromatogram. Similarly, the amount of contaminant that remains in the mixture is calculated from the ratio of the dark-shaded and the total area. In order to assess the ability of a specific operation to separate two or more proteins, the concentration factor (CFi,p,l) has been proposed. This parameter denotes the ratio of proteins (p and dp) that remain in the mixture after and before chromatographic technique i at separation level l. The mathematical correlations applied for each chromatographic technique are given in Table 1 for protein p at chromatographic step i with discrete recovery level l. Note that the concentration factor also depends on the peak width parameter (si), averaged over several proteins. Moreover, the discretization of the recovery levels is the equivalent of imposing a set of discrete values to Bi,p,l in Figure 2. The relationships expressed in Table 1 represent graphical approximations of the chromatograms for two different proteins. As a result, a fraction of proteins is admitted not to separate from the product (represented as 1.02 coefficients in Table 1). In Table 1, the first row indicates that purification is not carried out. In the following rows, the purification degree increases (that correspond to figures 2a to 2f), up to the case of almost complete separation (CFi,p,l = 0.02). The concentration factors CFi,p,l shown in Table 1 are introduced in the synthesis model that is described in the next section.
Synthesis model
An optimization model that minimizes the total number of chromatographic steps for a given purity level is proposed. This model relies on a convex
30
E. VASQUEZ-ALVAREZ and JOSE M. PINTO, A Mixed Integer Linear , Chem. Biochem. Eng. Q. 17 (1) 2734 (2003)
T a b l e 1 Mathematical relationships for chromatographic techniques Deviation Factor 0 DFi,p < si 10 Base Bi,p,l Bi,p,l = si DFi,p (fig 2a) CFi,p,l = 1 ( s - DF - B ) 2 i, p i , p,l i CFi,p,l = 1.02 1- 2 2 s i ( s - B ) 2 i i , p,l CFi,p,l = 1.02 1- 2 2 si ( s - DF - B ) 2 i, p i , p,l i CFi,p,l = 1.02 1- 2 2 s i B2 i , p,l CFi,p,l = 1.02 2 s 2 i ( DF + B ) 2 i, p i , p,l CFi,p,l = 1.02 1- 2 2 s i B2 i , p,l CFi,p,l = 1.02 2 s 2 i ( s - DF - B ) 2 i, p i , p,l i CFi,p,l = 1.02 1- 2 2 s i B2 i , p,l CFi,p,l = 1.02 2 s 2 i CFi,p,l = 0.02 Mass reduction of protein p "p
p = dp
"p dp
si s DFi,p < i 10 2
p = dp
"p dp
0 Bi,p,l <
si DFi,p 2
p = dp
(fig 2d)
"p dp
si DFi,p < si 2
0 Bi,p,l <
si DFi,p 2
p = dp
(fig 2e)
"p dp
DFi,p si
"p
Disjunction (3) contains I.L+1 elements for each order k. The first I.L terms model the selection of step i in order k at level l (represented by Boolean variables Li,k,l), whereas the last term models no step selection (represented by Boolean variable Ak). In each term, the mass of contaminant protein p at step k is related to the mass at the previous step. The following are the constraints from the proposed MILP model that are based on a convex hull relaxation of disjunction (3):
(a) Assignment constraints: Binary variables li,k,l that correspond to the Boolean variables in disjunction (3) are defined. Constraint (4a) indicates that at most one step i may be chosen in order k. Slack variable ak is activated if no steps are selected in order k. Constraint (4b) imposes that step i is selected at most once in the sequence and (4c) states that steps are assigned in increasing order in order to reduce model degeneracy.
l i,k ,l + a k = 1
i l
l i,k ,l 1
k l
l i,k+1,l l i,k ,l
i l i l
E. VASQUEZ-ALVAREZ and JOSE M. PINTO, A Mixed Integer Linear , Chem. Biochem. Eng. Q. 17 (1) 2734 (2003)
31
(b) Ordering constraints: Constraints (5a)-(5d) define the last step of the sequence, denoted by Zk. Z k l i ,k ,l - l i ,k+1,l "k K - 1 (5a)
i l i l
k = 1 K - 1 m dp ,k+1 ( fr m dp ,1 ) Z k k = 1 K - 1
(7) (8)
l i,k ,l Z k
i l
(e) Domain constraints Finally, constraint set (9) provides the domain of the binary and continuous variables. l i ,k ,l {0,1}
2 m p ,k , Z k , m1 i , p ,k ,l , m p ,k , a k 0
l i,k ,l + Z k 1
i l
"i, k , l "i, p, k , l
(9a) (9b)
Z k =1
k
(c) Contaminant constraints: Constraint set (6) relates subsequent steps and is generated from disjunction (3). This set results from the convex hull formulation of disjunction (3). The mass of protein p that remains after the first step is indicated by equation (6a). If technique i is selected at level l (li,1,l = 1), constraint (6a) sets the resulting mass that leaves the first stage, where mp,1 is the initial mass of protein p. In the following steps, constraints (6b) to (6e) hold. Variable mp,k denotes the mass of contaminant p before step k and is disaggregated in two terms (represented with superscripts 1 and 2) that correspond to the terms of disjunction (3). m p ,2 = CFi , p ,l l i ,1,l m p ,1
i l
An objective function (10) that selects a sequence with minimum number of steps for given purity as well as yield specifications is defined as follows: Min S = l i ,k ,l = k Z k
i k l k
(10)
Alternatively, profit can be maximized by taking into account the revenue from product sales and operating costs of the chromatographic columns with available economic data.
Computational performance
The software GAMS/CPLEX 7.0 9 was used to implement the MILP model and to generate its solution. Two different examples of increasing size are solved, which correspond to the first two presented in Vasquez-Alvarez et al.4 In Example 1, taken from Lienqueo et al.10, we consider the purification of a mixture containing four proteins, all in equal concentration. Their physicochemical properties as well as the initial protein concentration of the mixture are shown in Vasquez-Alvarez et al.4 The required purity level for p1 is 98 %. If no product loss is considered (Case a1), results are the same as those of model (M1a) by Vasquez-Alvarez et al.4, which comprise three steps and 99.8 % final purity (see Figure 3). Nevertheless, if 4 % loss of product is accepted (Case b1), only two steps are necessary and 99.9 % final purity is achieved, as shown in Figure 4. In Example 2, we consider the purification of b-1,3 glucanase (8.3 % initial concentration) that must be separated from eight contaminants; twenty-two chromatographic techniques are available.11 Consider Case a2 that corresponds to 94 % purity and 100 % recovery of b-1.3 glucanase and Case b2 given by 99 % purity with 6 % of product losses. In Case a, results are the same as in
"p
(6a)
2 m p ,k+1 = CFi , p ,l m1 i , p ,k ,l + m p ,k i l
"p, k = 2 K - 1
(6b)
2 m p ,k = m1 i , p ,k ,l + m p ,k i l
m1 i , p ,k ,l U l i ,k ,l m2 p ,k U a k
(d) Specification constraints Constraints (7) and (8) enforce purity and yield specifications, respectively. Constraint (7) imposes that the specified purity of the protein of interest must be achieved. In (8), if Zk = 1, the ratio of the final mass and initial mass or the desired protein must satisfy the recovery fraction fr. Note that if the value of parameter fr is set to 1 it imposes complete recovery of the desired protein.
32
E. VASQUEZ-ALVAREZ and JOSE M. PINTO, A Mixed Integer Linear , Chem. Biochem. Eng. Q. 17 (1) 2734 (2003)
Sensitivity analysis
The objective of this analysis is to determine the effect of some parameters on the optimal solutions of the model for minimizing the number of chromatographic steps considering product losses. First, we study the effect of the number of levels of product losses (L) on model solution and performance. Example 2 is taken as a basis for 99 % product purity level (fp = 0.99). For instance, for a fixed upper bound of 10% product loss (fr = 0.90) the following alternatives were tested: ten levels of 1 % (l1), five levels of 2 % (l2), 2 levels of 5 % (l5) and one cut of 10 % (l10). Besides the above levels, the 100 % product recovery alternative is included. Two steps and 100 % final purity are obtained for l1 and l2, whereas three steps are necessary for l5 and l10, both for 99.8 % final purity. Figure 7 shows the effect of the number of levels on model size and computational performance.
Vasquez-Alvarez et al.4, and is shown in Figure 5 (6 steps and 94.8% for final purity). For Case b2, three techniques are employed and 99.7 % final maximum purity is achieved, as is shown in Figure 6. Statistical data for both examples are given in Table 2.
T a b l e 2 Summary of statistical data for Examples 1 and 2 Example 1 L 1 3 1 4 fr (%) 100.0 96.0 100.0 94.0 Integer variables 168 456 288 1080
SOPT 3 2 6 3
E. VASQUEZ-ALVAREZ and JOSE M. PINTO, A Mixed Integer Linear , Chem. Biochem. Eng. Q. 17 (1) 2734 (2003)
33
T a b l e 3 Minimum number of stages for different purity and recovery specifications in Example 2 fr 0.82 0.82 0.82 0.82 0.82 0.84 0.84 0.84 0.84 0.84 F i g . 7 Statistical data for 10 % of product losses and 99 % specified purity in Example 2 0.86 0.86 fp 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 S 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 2 2 2 2 fr 0.92 0.92 0.92 0.92 0.92 0.94 0.94 0.94 0.94 0.94 0.96 0.96 0.96 0.96 0.96 0.98 0.98 0.98 0.98 0.98 1.00 1.00 1.00 1.00 1.00 fp 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 0.90 0.94 0.96 0.98 0.99 S* 2 2 2 3 3 2 3 3 3 3 2 3 3 3 3 2 3 3 Inf. Inf. 4 6 Inf. Inf. Inf.
Table 3 illustrates the optimal results for Example 2 under different purity and recovery requirements. It is interesting to note that only one step is necessary for purity levels of up to 98 %, if product recovery requirements are not very high (approximately 88 %). On the other hand, high yields and purity levels are unattainable. Note that the last set of results (fr = 1.00) corresponds to the cases of no product loss, as obtained in Vasquez-Alvarez et al.4 However, it can be verified that a significant improvement in the process on can be obtained by simply relaxing the assumption of complete product recovery; for instance, only three steps are required for a 98 % recovery of the desired protein.
0.86 0.86 0.86 0.88 0.88 0.88 0.88 0.88 0.90 0.90 0.90 0.90 0.90
Conclusions
This paper presented the development of an optimization model for the synthesis of chromatographic steps for the purification of protein mixtures considering product losses. The model was based on the approximation of chromatograms by isosceles triangles and on the convex hull formulations of disjunctions that related the selection of purification techniques. Moreover, the discretization of recovery levels in the chromatograms generated an MILP that could be solved to global optimality. Results indicate that a systematic selection and sequencing of chromatographic steps may be obtained by the appropriate balance between yield and purity level.
* Infeasible solutions
ACKNOWLEDGEMENTS The authors acknowledge financial support from PADCT/CNPq under grant 62.0239/97 QEQ, from ANTORCHAS and VITAE (Coop. Programs Argentina Brasil Chile) under grants A-13668/19 and B-11487/10B006.
34
Notation
E. VASQUEZ-ALVAREZ and JOSE M. PINTO, A Mixed Integer Linear , Chem. Biochem. Eng. Q. 17 (1) 2734 (2003)
Ak
Indices a physicochemical property (a Ai) i chromatographic technique (i = 1,I) k Order in the sequence (k = 1,K) l Level of target protein losses (l = 1,L) p Protein (product + contaminants) dp Desired protein (product) Parameters Bi,k,l Width of contaminant peak that remains with the product CFi,p,l Concentration factor of contaminant p after step i in level l DFi,p Deviation factor for protein p in chromatographic step i fp Specified purity level of dp fr Specified yield level of dp Kdi,p Retention time of protein p in technique i Pa,p Value of physicochemical property a for protein p U Upper bound on protein mass si Peak width of chromatographic step i Variables mi1, p ,k ,l Disaggregated variable for mass of protein p after chromatographic technique i in order k for discrete level l (1st term of disjunction) m2 p ,k Disaggregated variable for mass of protein p in order k (2nd term of disjunction) mp,k Mass of p before technique in order k Objective function variable S Zk Binary variable that indicates if order k is last ak Slack variable relative to the selection of order k
li,k,l Binary variable for selecting technique i in order k at level l of product loss Li,k,l Boolean variable for selecting technique i in order k at level l of product loss
References 1. Lienqueo, M. E., Asenjo, J. A., Comput. Chem. Eng. 24 (2000) 2339 2. Watanabe, E., Tsoka, S., Asenjo, J. A., Recombinant DNA Technology II, in Annals of the New York Acad. of Sciences, and Bajpai R., Prokop, A. (Eds.) 721 1495, New York, 1994 3. Larsson, G., Jorgensen, S. B., Pons, M. N., Sonnleitner, B., Tijsterman, A., Titchener-Hooker, N., J. Biotechnol. 59 (1997) 3 4. Vasquez-Alvarez, E., Lienqueo, M. E., Pinto, J. M., Biotechnol. Prog. 17 (2001) 685 5. Bryant, C. H., Rowe, R. C., Trac Trends in Analytical Chemistry 17(1998) 18 6. Steffens, M. A., Fraga, E. S., Bogle, I. D. L., Biotech. Bioeng. 68 (2000) 218 7. Vasquez-Alvarez, E., Pinto, J. M., Proc. ESCAPE-11 (European Symposium on Computer Aided Process Engineering), Eds. R. Gani and S. B. Jorgensen. Elsevier, Amsterdam (2001) 579 8. Lienqueo, M. E., Ph.D Thesis (in Spanish), University of Chile, Santiago (Chile), 1999 9. Brooke, A., Kendrick, D., Meeraus, A., Raman, R., GAMS -A user's guide, The Scientific Press, Redwood City, USA. 1998 10. Lienqueo, M. E., Salgado, J. C., Asenjo, J. A. Computer Applications in Biotechnology CAB7, Japan (1998) 321 11. Lienqueo, M. E., Salgado, J. C., Asenjo, J. A. Chem. Tech. Biotech. 74 (1999) 293