Good Analytical Part II
Good Analytical Part II
Good Analytical Part II
o avoid potential inspection requires a certain expertise, since acThe scope of ceptance criteria should balance observations for test method validations by the regulatory method performance expectations this article agencies, it has become critical for with method requirements (from proprovides more pharmaceutical companies to derive duct specifications) and AMV exereasonable acceptance criteria for detail on how to cution conditions (conducted by QC the Analytical Method Validation analysts under routine QC testing). systematically In general, the time required to pre(AMV) protocol. Part I of Good Analytical Method Validation Practice pare the AMV protocol should acderive (GAMVP) (November 2002 issue, count for about 50% of the total time reasonable acJournal of Validation Technology) allocated to the complete (approved) mostly emphasized ground rules for validation. Less time spent on the proceptance the AMV department to be compliant tocol may result in time-consuming criteria for and efficient within the Quality Condiscrepancy reports, and validation trol (QC) unit. The scope of this arti- AMVs, and how retesting when acceptance criteria cle provides more detail on how to failed during execution. Or, the acto integrate systematically derive reasonable acceptance criteria do not sufficiently ceptance criteria for AMVs, and those into the challenge the test system suitability, how to integrate those into the AMV so this validation failed to deprotocol. One specific example to de- AMV protocol. monstrate that this method will yield scribe the process of deriving AMV accurate and reliable results under acceptance criteria is provided. This example summa- normal testing conditions. In addition, invalid and porizes most aspects to be considered in order to gener- tential Out-Of-Specification (OOS) results may be obate an AMV protocol that can readily be executed, tained when test system suitability is not properly deand lead to a solid AMV report. monstrated. Management should keep in mind that a For successful AMVs, available data and other suprigorous AMV program, employing reasonable accepporting information for the test method to be validated tance criteria, may prevent discrepancy reports, OOS remust be carefully reviewed against current in-process sults, and potential product loss, since unsuitable test or product specifications. This process takes time and methods should not be used for routine QC testing.
A n a l y t i c a l M e t h o d s Va l i d a t i o n 31
Figure 1
Required Validation Parameters for ICH Assay Categories and Specification Codes
Required Validation Parameters for ICH Category (I IV) 1 2 3 4 and 5 I II III IV No No No Yes No No No No Yes Yes Yes Yes Yes Yes No Yes No No No Yes No No Yes No Yes Yes Yes Yes Yes Yes No No
Specification Code ICH Category Accuracy Repeatability Precision Intermediate Precision Specificity Linearity Assay Range Limit of Detection Limit of Quantitation
capability expectations. All factors should be evaluated and integrated to derive acceptance criteria. Product specifications for qualitative assays are generally coded as Match/No Match (or pass/fail, present/absent, etc.), and should be qualified or validated on a case-by-case basis. Many microbiological assays have abnormal (non-gaussian) data distributions (usually well-described by Poisson statistics), and are more difficult to generally classify for validation.
Figure 2
1a) Product Specification Code Number 2 (Target: NMT 20%, Range: 0-20%)
10
20
30
40
50 Percentages
60
70
80
90
100
Legend/Definitions
Graphical (Quantitative) Representation of Product Specifications ICH Q2B Required Demonstration of Assay Range (within Assay Range Results must be: Accurate, Precise, and Linear) Method Capability = Method Performance Expectations
racy expectations (assuming expected equals true), potential systematic error (i.e., different response factor of spiked reference material) must be evaluated and factored into the acceptance criteria, unless the AMV protocol permits normalization, if required. To keep systematic error at a minimum, common scientific sense should be used when describing spike sample preparation in the protocol (i.e., large volumes for spiked stock solutions, only calibrated equipment). Many quantitative assays have ranges for their product specifications (code no. 5). The midpoint of this range is the target concentration that was either set historically from testing results, or as a manufacturing process target. When deriving acceptance criteria, one should consider that test system suitability must be demonstrated for this target range, which is exactly half of the specification range (target range = target concentration 0.5 x specification range). During QC routine testing, the test system must be capable to readily meeting this target range, and must be demonstrated in the AMV. It must therefore be demonstrated that the combined effects of lack of accuracy and reli-
ability (precision) within the assay range can routinely be limited in order to support results within and outside product specifications (OOS). In other words, the acceptance criteria for accuracy and precision, combined within the assay range, should not be wider than half of the product specifications range, (at maximum) because one would otherwise fail to demonstrate test system suitability for this product. Intermediate precision should ideally be used here, since all routine testing samples could be tested by any trained operator on any qualified instrument on any given day. Repeatability precision (less variability) simply would not reflect this overall assay variability. The derivation of acceptance criteria for the remaining quantitative assays (code nos. 2 and 4) should also be dealt with in a similar matter. Given what was mentioned above, there are several ways to derive acceptance criteria for accuracy. One way is: intermediate precision acceptance criteria could be derived first from historical data (Analytical Method Development [AMD] or QC testing). The numerical limits for intermediate precision are then
A n a l y t i c a l M e t h o d s Va l i d a t i o n 33
subtracted from the target range, and the remaining difference will set the maximum permissible acceptance criteria range for accuracy. This is illustrated in the AMV acceptance criteria example (Figure 6). It may be more advisable not to use statistical approaches to demonstrate accuracy, such as t-statistics (comparing means of observed versus expected percent recoveries of various spike concentrations). The reason is that a potential systematic error is not accounted for in the expected recovery (mean = 100%, variance = 0). The expected recovery will then be compared to the observed recovery (mean 100%, variance 0), so that a statistical difference (i.e., t-test at 95% confidence) is likely to occur, although this difference may not be significant when compared to a numerical limit (percent or units). It may therefore be more practical to give numerical limits for accurate acceptance criteria. Data generated for accuracy may be used to cover required data for other validation parameters, such as, repeatability precision, linearity, assay range, and Limit of Quantitation (LOQ). Repeatability Precision indicates how precise the test results are under ideal conditions (same sample, operator, instrument, and day). Repeatability precision should be demonstrated over the entire assay range, just like accuracy and data generated for this parameter may be used. This has the advantage that fewer samples will have to be run. Even more important, when acceptance criteria are derived and connected, only one data set will be used, therefore, decreasing potential random error introduced by multiple sample preparations. The demonstration of repeatability precision is mostly affected by how well random errors in sample preparation can be controlled. Random experimental errors can only be controlled to some degree, since the Standard Operating Procedure (SOP) and AMV protocol should be followed as written by operators routinely generating QC testing results. When using AMD data, the actual generation conditions of this data must be evaluated and put into perspective to set AMV acceptance criteria. When using QC routine testing data, data for the assay control can be summarized and used as a worse-case scenario for the AMV protocol. The Standard Deviation (SD) of this historical data can be expressed as confidence limits (i.e., 95% confidence 2 x SD), units, or percent
34 I n s t i t u t e o f Va l i d a t i o n Te c h n o l o g y
(coefficient of variation, CV = SD/Mean x 100%), and should be used as the absolute limit for the AMV data, since historical data (several operators, instruments, days) should have less precision (greater CV) than AMV data. Intermediate Precision should be demonstrated by generating a sufficiently large data set that includes replicate measurements of 100% product (analyte) concentration. This data should ideally be generated by three operators on each of three days, on each of three instruments. Different analyte concentrations to demonstrate intermediate precision over the entire assay range could be used, but results must be converted to percent recoveries before those can be compared. A data matrix where the total amount of samples can be limited, but differences among or between variability factors, such as operators and days, can still be differentiated, is illustrated in Figure 3. The complete data set should then be statistically evaluated by an Analysis of Variance (ANOVA), where results are grouped by each operator, day, and instrument, but analyzed in one large table. Acceptance criteria state no significant difference at 95% confidence (p > 0.05) of data sets evaluated by ANOVA. It is advisable to include a numerical limit (or percentage) because the likelihood of observing statistical differences increases with the precision of the test method. In addition, some differences among various instruments, operator performances, and days (i.e., sample stability or different sample preparations for each day) are normal. The overall intermediate precision allowed should be relative to the expected accuracy, and must be within the combined limits for accuracy and intermediate preFigure 3
cision. Additional F-tests and T-tests should be performed if overall p-value is less than 0.05 to evaluate the differences among factors and within factors. More detail will be provided in Part III of GAMVP: Data Analysis and the AMV Report. Specificity of an assay is usually ensured by demonstrating none or insignificant matrix and analyte interference. The matrix may interfere with assay results by increasing the background signal (noise). Or, matrix components may bind to the analyte of interest, therefore potentially decreasing the assay signal. Spiking of the analyte of interest into the product (liquid), and comparing the net assay response increase versus the same spike in a neutral liquid (i.e., water or buffer), provides information on potential matrix interference. Reasonable acceptance criteria are: No observed statistical difference (t-test at 95% confidence) between assay responses of spiked samples of product matrix, versus those of buffer matrix. If the assay precision is relatively high, it is advisable to also include a numerical limit, in case p < 0.05, which should be similar to the limit stated under the validation parameter repeatability precision. This has the advantage that in case a statistical difference is observed, a reasonably derived numerical limit should be able to compensate for differences in sample preparation. Other analytes potentially present in the product matrix should be spiked in proportional concentrations into the product matrix (keeping final analyte concentrations constant). Results of unspiked versus spiked product should also be compared by a t-test, and the acceptance criteria should be the same as those for matrix interference. Linearity of the assay response demonstrates proportionality of assay results to analyte concentration. Data from accuracy may be used to evaluate this parameter. Linearity should be evaluated through a linear regression analysis, plotting individual results of either analyte concentration versus assay results, or observed versus expected results. The later approach should ideally yield a linear regression line slope of one (1). A slope smaller than one indicates a decreasing assay response with increasing analyte concentrations and vice versa. A y-intercept significantly greater or less than 0 with a slope of one, suggests a systematic error (i.e., sample preparation or spiked sample response factor 1). A correlation coefficient less
than one may reflect a lack of linearity, inaccuracy, imprecision, or all of the above. ICH Q2B requires reporting the regression line y-intercept, slope, correlation coefficient, and the residual sum of squares. Only acceptance criteria for the slope and the correlation coefficient need to be reported for linearity. Deriving these from accuracy and precision expectations is rather complex, and may not be practical. Depending on the sample preparation and the method capabilities for accuracy and precision, reasonable acceptance criteria should be stated in the AMV protocol. Reasonable criteria are: r 0.98 (98% curve fit) and the 95% confidence interval of the regression line slope should contain 1. The Assay Range of a method must bracket the product specifications. By definition, the LOQ constitutes the lowest point of the assay range, and is the lowest analyte concentration that can be quantitated with accuracy and precision. In addition to the required accuracy and precision for all analyte concentration points within the assay range, the assay response must also be linear, as indicated by the regression line coefficient. Data for the assay range may be generated in the AMV protocol under accuracy. Again, the advantages are a limited sample size to be run and evaluated, and the ability to evaluate this and other validation parameters from one set of prepared samples. Acceptance criteria for assay range should therefore be identical to those of accuracy, repeatability precision, and linearity. Limit of Detection (LOD) of an analyte may be described as that concentration giving a signal significantly different from the blank or background signal. ICH Q2B suggests three different approaches to determine the LOD. Other approaches may also be acceptable when these can be justified. Per ICH, the LOD may be determined by visual inspection (A), signal-to-noise ratio (B), or the SD of the response and the slope (C). Visual inspection should only be used for qualitative assays where no numerical results are reported. The signal-to-noise approach (B) may be used whenever analyte-free product matrix is available. The analyte should then be spiked at low concentrations in small increasing increments into the product matrix. The LOD is then determined as the signal-to-noise ratio that falls between 2:1 and 3:1. This is the simplest and most straightforward quantitative approach.
A n a l y t i c a l M e t h o d s Va l i d a t i o n 35
Acceptance criteria derived for approach B should be similar to those based on repeatability precision. Criteria could be, for a desired signal-to-noise ratio of 3:1, three times the SD of repeatability precision. Approach C uses the following formula: LOD = 3.3 s/m , where s is the SD of the response, and m is the slope of the calibration or spiked-product regression line. An estimate of the LOD is then obtained by the principle of the method of standard additions. This is graphically represented in Figure 4. If an assay simultaneously quantitates the active product and the impurity, data generated in the accuracy section and evaluated in linearity may be used to estimate the LOD using the regression line approach. Sufficient low analyte (impurity) concentrations must be included in the initial data set for accuracy to evaluate the LOD from one sample preparation set. The LOD acceptance criteria for approach C should be identical to those based on repeatability precision if the identical data set was used. When linearity data is evaluated by regression analysis, the LOD must not exceed the repeatability precision criteria when the predicted SD regression line y-intercept is multiplied by 3.3, and divided by the regression line slope (slope 1). For approach A, B, or C, and any other justified approaches, the LOD acceptance criteria must be significantly lower than the product specifications and Figure 4
12 10 8 6 4 2 0 0 1 2
LOQ: 0% Spike + 10 SD (3.15%) LOD: 0% Spike + 3.3 SD (1.84%)
10
the LOQ. Selecting and justifying a particular approach should be done with a knowledge of method capabilities, in particular the level of precision. One cannot expect to determine a relatively low LOD, as the variance within low analyte concentrations is relatively high. Limit of Quantitation (LOQ) is by definition the lowest analyte concentration that can be quantitated with accuracy and precision. Since the LOQ constitutes the beginning of the assay range, the assay range criteria for linearity must be passed for the particular analyte concentration determined to be the LOQ. The determination of the LOQ involves the same approaches (A, B, and C) as those for LOD. The only difference is the extension of the required signal-tonoise ratio to 10:1 (approach B), or the change in the formula (approach C) to: LOQ = 10 s/m. The acceptance criteria for LOQ should therefore be set proportionally similar to those indicated for LOD. In addition, the LOQ acceptance criteria should contain the same limits for accuracy, repeatability precision, and linearity, as set for each of these validation parameters. Two reasons of caution should be considered when following ICH approach C. One, the determination of particular analyte concentrations for LOD and LOQ are independent of sample size, but sample size should be 6. Individual results plotted for each analyte concentration tested (instead of averages) generally yield higher SDs, and therefore higher LODs and LOQs. Two, approach C only delivers acceptable LODs and LOQs when the assay response is highly linear, precise, and accurate over the plotted concentration range. In addition, the spiked sample preparation must be accurately performed to prevent further random deviations from the regression line. If any of these raised issues may be a real concern, a different justified approach should be chosen. Robustness should be addressed during method development. The main reason is that a method and its governing SOP are not to be changed for routine testing and the validation of that SOP. The SOP controls operational limits within the overall system suitability criteria that are set during AMD. Deliberate small changes to the test system should be done during development, because significant differences in the AMV results may not be easily explained in the AMV report. System Suitability should be demonstrated by showing
I n s t i t u t e o f Va l i d a t i o n Te c h n o l o g y
that a complete test system is capable of delivering accurate and reliable results over time when used under routine QC testing conditions. All materials to be tested or used in testing should be stable in excess of the duration of the test procedure. Appropriate reference material (standards and/or controls) should be used to establish and control system suitability. Standards and controls should have reasonable acceptance limits properly derived from historical data. These limits should be regularly monitored and adjusted to account for minor changes, such as those potentially expected from switching reagents. Overall test system suitability is generally demonstrated by passing the acceptance criteria of all AMV parameters evaluated. During the AMV execution, all invalids, repeats, and OOS results generated should be evaluated in the AMV report. More detail will be provided in Part III of GAMVP.
Approach The CZE test method must be validated for content/potency (major component) and for quantitation of impurities. From the information listed in Figure 1, the CZE test method must be validated simultaneously for ICH category I and II. The required validation parameters are accuracy, repeatability precision, intermediate precision, specificity, linearity, assay range, LOD, and LOQ. The next step is to analyze product specifications, and compare those to the historical assay performance. In general, the historical assay performance can be evaluated from AMD data, previous validation data, historical product final container QC testing data, and historical assay control data. Since we are revalidating this CZE test procedure without having changed test method system parameters besides our minor product reformulation, there is no need to evaluate AMD and previous validation data. Assuming that there were no recent minor changes (i.e., change in reagent manufacturer) that could have shifted historical results for the assay control (and product), historical QC data for final containers of product, and the assay control of the last several months (n 30) should be evaluated. Historical product results will contain lot-to-lot variation due to an expected lack of complete product uniformity. These results are therefore expected to have a greater variation than those of the assay control. The historical QC testing data for the control and product are listed in Figure 5. Figure 5
Historical Testing Data for the Assay Control and Product Over the Last Six Months
Sample/Statistic Percent Percent Percent Purity Impurity A Impurity B Prod. Cont. Prod. Cont. Prod. Cont.
Sample Product Specifications 90% 5% 10% n 90 90 90 90 90 Mean (in percentages) 94.1 91.4 2.0 2.8 3.9 Standard Deviation (in percentages) 1.32 1.14 0.43 0.31 0.55 CV (in percentages) 1.41 1.25 28.6 11.1 13.8 KEY: Prod. (Product) Cont. (Control)
A n a l y t i c a l M e t h o d s Va l i d a t i o n
37
The data of Figure 5 may then be used to generate the acceptance criteria for all required validation parameters. Figure 6 lists each validation parameter with the relevant AMV design, brief sample preparation, reported results, acceptance criteria, and a rationale for acceptance criteria for those areas.
and on which instrument. This table will demonstrate to the reader of this document that the proposed validation is well-planned, and should furthermore prevent execution deviations by the operators. A validation execution matrix example is given in Figure 8. A list of references to the governing Standard Practice (SP) and supporting documents ensures the reader that all relevant procedures are followed, and that relevant supporting documents (CoA, product specifications, historical data, and supporting reports) were consulted. All supporting documents should be attached (list of attachments) and filed with the protocol. A final section, AMV matrix and acceptance criteria, in which the reader can refer to a table where each validation parameters validation approach, reported results, and acceptance criteria are summarized, will be helpful. Information can be copied from the validation parameter section.
Figure 6
AMV Design
Sample Preparation
Reported Results
Mean purity (n=3) in %, identification (n=3): Yes/no
Acceptance Criteria
Identification of commercially purchased proteins must match impurity protein A and B, respectively.
Pre-requirement (2)
Accuracy
Identification and purity of commercially purchased protein impurity A and B must be determined using complimentary tests (other methods such as SDSPAGE, HLPC, HPSEC, MS, Western Blot). Run in triplicates. Potential response factor differences for protein impurity A and B must be determined. Differences in purity and/or response factors must be normalized for percent recovery calculations. Run in triplicates. Percent recoveries of commercially purchased reference material for protein impurity A and B will be determined from increasing spike concentrations by using Relative Percent Area (RPA). RPAs for each protein impurity and corresponding therapeutic protein will be determined using individual response factors (if required). All spike concentrations will be run in triplicates by Operator 1 on Day 1 using Instrument 1. Percent Recovery = (Observed RPA/Expected RPA) x 100%.
Follow SOP for CZE. Ideally, protein impurity A and B should be tested individually at product specification concentration, and final container product lot (A) should be tested at 100%. Spike commercially purchased protein impurity A and B each into reformulated final container product (lot A) with increasing concentrations (0.0, 0.5, 1.0, 2.0, 5.0, 10.0, 15.0, 20.0 %) keeping final protein concentration constant.
N/A
Mean area counts for None each of impurity A and B. Response factors.
Data: three replicates over three concentrations covering the Assay Range.
Mean percent recoveries (n=3) for each spiked concentration (n=7) for impurity A, impurity B, and the corresponding percent recoveries for the therapeutic protein will be tabulated.
Mean spike recoveries for impurity A and impurity B for each spike concentration (n=7) must fall within 10040% and 100+ -20%, respectively. Each corresponding mean spike recovery (n=2x7) for the therapeutic protein must fall within 98102%.
The combination (worst-case scenario) of assigned limits for Intermediate Precision and Accuracy must be no greater than the difference between historical mean product results (n=3, see Table 3) and their corresponding product specifications (n=3). A worst-case limit of historically recorded 2 SDs (of assay control, see Intermediate Precision) has been assigned to Intermediate Precision. This limit is then subtracted from the product specifications, and constitutes the maximum value for the acceptance criteria for Accuracy. An example for the therapeutic protein
Continued
A n a l y t i c a l M e t h o d s Va l i d a t i o n
39
Figure 6 (Continued)
AMV Design
Sample Preparation
Reported Results
Acceptance Criteria
Repeatability Precision
Data will be generated in Accuracy to demonstrate precision over the entire Assay Range. In addition, Operator 1 on Day 1 using Instrument 1 will generate n=15 data points using one final product container lot. This extensive data set for Repeatability Precision will be used to generate the appropriate number of significant digits to be reported for test results.
Follow SOP for CZE and test one final product container lot (A) at 100%.
Intermediate Precision
One unspiked final product container lot (A) will be tested in triplicates on each of three days by each of three operators on each of three instruments. Intermediate Precision will be determined for each purity and integrity characteristic by using an Analysis of Variance (ANOVA). Any statistical differences (at the 95% confidence level) between and within factors (operators, days, instruments) will
Mean CVs (n=8) from Accuracy data must be within the following limits (in RPA): % therapeutic protein: NMT 2.5, % impurity A: NMT 22. % impurity B: NMT 13. CVs (n=3) from 15 data points must be within the following limits (in RPA): % therapeutic protein: NMT 1.3, % impurity A: NMT 11. % impurity B: NMT 6.7. Follow SOP for Data/Report: No Overall and P-value of specific require- individual PCZE and test ANOVA must one final prod- ments. Variations values of fac- be NLT 0.05. If tors (opera(factors) to be uct container p < 0.05, addilot (A) at 100%. studied (in a ma- tors etc.) from tional F-tests ANOVA. Over- and T-tests will trix) are days, all and factor be performed operators, and CV(s) and equipment. to isolate facSD(s) for % tors with statistherapeutic tically different protein, promeans and/or tein impurity variations. An A, and protein investigation impurity B. must demonstrate that each different factor mean (at p=0.05) will not affect assay performance and overall system suitability.
Data: Nine determinations over Assay Range (e.g., three replicates over three concentrations). six determinations at 100% test concentration.
From Accuracy data: CVs (in %), means (n=3), SDs, CIs (p=0.05) for means, for % therapeutic protein, protein impurity A, and protein impurity B. Report: Standard From Repeatability Deviation (SD), data: Coefficient of Variation (CV), CV (in %), Confidence Inter- mean (n=15), val (CI). SD, CI (p=0.05) for mean, for % therapeutic protein, protein impurity A, and protein impurity B.
40
I n s t i t u t e o f Va l i d a t i o n Te c h n o l o g y
Figure 6 (Continued)
AMV Design
Sample Preparation
Reported Results
Acceptance Criteria
Overall CV must comply with the following limits: % therapeutic protein (in RPA): NMT 2.5, % impurity A: NMT 22. % impurity B: NMT 13. Factor CVs must comply with the following limits: % therapeutic protein (in RPA): NMT 1.3, % impurity A: NMT 11.% impurity B: NMT 6.7. No statistical significant difference (at 95% confidence level) shall be obtained (p > 0.05) in ANOVA. If p < 0.05, additional F-tests and ttests will be performed to isolate spiked samples with statistically different means and/or variations. An investigation must demonstrate that each different factor mean (at p=0.05) will not affect assay performance and overall system suitability. The difference(s) among spiked
Specificity
Matrix interference: Matrix interference will be evaluated by comparing results for each impurityspiked (A and B) sample, spiked into final product container (lot A), to those of spiked assay control, and spiked current final product (lot B). Percent recoveries will be compared by ANOVA and, if required, by t-tests to evaluate potential differences between product lot (lot A), the assay control, and current final product (lot B). One operator will run all samples on one day on one instrument. The following samples will be prepared: Three spiked sample preparations of each impurity (n=2) for each sample
Matrix interfer- No specific reence: All sam- quirements. ples (constant final concentrations) will each be spiked with 5% of protein impurity A and B.
Individual and mean (n=3) RPAs and corresponding percent recoveries for spiked samples (n=6) will be reported. An ANOVA table will be presented.
The means and precision variabilities among and between factors should not be statistically different at 95% confidence. Similar to Intermediate Precision, the likelihood of observing statistical difference(s) increases with assay precision, and may not impact system suitability. In addition, we should account for potential differences in results due to sample preparations. It is therefore advisable to set an escape clause by generating numerical limits for difference limit (1 SD of assay control) from the historical data. It is more meaningful to use the historical assay control data (see Table 3) here
Continued
A n a l y t i c a l M e t h o d s Va l i d a t i o n
41
Figure 6 (Continued)
AMV Design
Sample Preparation
Reported Results
Acceptance Criteria
matrices (lots A and B, and assay control) for each spiked impurity (n=2), must be no greater than the following limits (in RPA): NMT 1.3, % impurity A: NMT 11. % impurity B: NMT 6.7. Correlation coefficient 0.98 for each of three regression lines. All three CIs (at 95% confidence) for each regression line slope must contain 1.
(n=3). All samples will be run three times (total runs: n=3x2x3x3=54). Analyte interference: Analyte interference can be inferred from the matrix interference studies.
Linearity
Linearity will be See Accuracy. determined at the low percentage range (approx. 020 RPA) to cover a potential impurity range (NMT 5% impurity A; NMT 10% impurity B), and at the high percentage range (approx. 75 to 95 RPA) to cover the product specifications for the therapeutic protein (NLT 90 %). Three regression lines will then be generated, one each for the two low (impurity A and B), and one for the high (therapeutic protein) percentage ranges. Individual RPA results (n=3 for each spiked concentration) for each spiked concentration (0.0, 0.5, 1.0, 2.0, 5.0, 10.0, 15.0, 20.0%) will be plotted against actual spike concentrations (in RPA) present. Assay Range will See Accuracy. be determined at the low percentage range (approx. 0-20 RPA) to
Correlation coefficient(s), y-intercept(s), slope(s) of regression line(s), and Residual Sum(s) of Squares (RSS) should be reported. A plot of the data (regression line) to be provided. NLT 5 concentrations to be tested.
Regression line slopes, intercepts, correlation coefficients, RSS for each regression line. Plots (n=3) of the regression lines of individual RPA results (n=3 for each spiked concentration) for each spiked concentration (0.0, 0.5, 1.0, 2.0, 5.0, 10.0, 15.0, 20.0%) versus actual spike concentrations (in RPA) present will be provided.
Because lack of Accuracy, Repeatability Precision, and differences in sample preparation(s) may contribute to a decrease in regression line fit (lower correlation coefficient), a generally acceptable correlation coefficient ( 0.98) should be used here. The confidence limits of the slope should contain 1 since otherwise assay response may not be sufficiently proportional to support quantitative results over the entire assay range.
Assay Range
All results generated within the determined Assay Range must be accurate and precise. The
Continued
42
I n s t i t u t e o f Va l i d a t i o n Te c h n o l o g y
Stephan O. Krause
Figure 6 (Continued)
AMV Design
Sample Preparation
Reported Results
Acceptance Criteria
regression line slope CIs (95% confidence) must contain 1. All acceptance criteria for Accuracy, Repeatability Precision, and Linearity must be passed.
cover a potential impurity range (NMT 5% impurity A; NMT 10% impurity B), and at the high percentage range (approx. 75 to 95%) to cover the product specifications for the therapeutic protein (NLT 90 %). For details, see Linearity section.
Limit of Detec- The LOD will be determined for tion each impurity (A and B) concentration from data generated in the Accuracy section and evaluated in the Linearity section. For details, refer to the Linearity section. Since final product container lot (A) may contain significant levels of each of impurity A and B (> 1%), the LOD will be determined from the regression lines generated for impurity A and B in the Linearity section as per section VII.C.1 of ICH Guidance to Industry document Q2B. LOD = (3.3 x ) / S The slopes (S) will be determined from the linear regression data for each impurity (A and B). The standard deviation () of the response will be determined from
for each regression line will be reported. All coefficients of variation (CV) for RPA for each spiked concentration will be reported. An overall CV for each of the three spiked samples series (impurity A, B, and therapeutic protein) will be reported. Approach C (see All concentrations and resection LOD of sults (in RPA) this article): LOD = (3.3 x ) / will be tabuS, where = SD lated. The apof response and parent LODs (in RPA) for S = regression each impurity line slope. (n=2) will be reported.
The LODs for impurity A and B must be NMT 0.4% and 0.9%, respectively.
In general, this ICH recommended approach to determine LOD may yield relatively high values for LOD (and LOQ) versus some alternative approaches. The level of Accuracy, Repeatability Precision, and Linearity in results generated by this test system will be reflected in the LOD (and LOQ). The LOD should be less (33%) than the LOQ, which in turn must be significantly less than the historical product impurity means. See also LOQ.
Continued
A n a l y t i c a l M e t h o d s Va l i d a t i o n 43
Stephan O. Krause
Figure 6 (Continued)
Limit of Detec- the RPA results for each impurity tion (A and B) in the Repeatability Precision section. Limit of Quan- The LOQ will be determined for titation each impurity (A and B) concentration from data generated in the Accuracy section, and evaluated in the Linearity section. For details, refer to the Linearity section. Since final product container lot (A) may contain significant levels each of impurity A and B (> 1%), the LOQ will be determined from the regression lines generated for impurity A and B in the Linearity section, as per section VIII.C.1 of ICH Guidance to Industry document Q2B. LOQ = (10 x ) / S The slopes (S) will be determined from the linear regression data for each impurity (A and B). The standard deviation () of the response will be determined from the RPA results for each impurity (A and B) in the Repeatability Precision section. System Suit- All current criteria ability for system suitability (per SOP) must be satisfied in order for each test to be considered valid. Each failing test will be
44
Approach C (see section LOQ of this article): LOQ = (10 x ) / S, where ( = SD of response and S = regression line slope.
All concentrations and results (in RPA) will be tabulated. The apparent LOQs (in RPA) for each impurity (n=2) will be reported.
The LOQs for impurity A and B must be NMT 1.1% and 2.8%, respectively.
The LOQ should be significantly less than the historical mean impurity results (2.0% and 3.9% for impurity A and B, respectively, see Table 3). We can determine the LOQ (and therefore the LOD) by subtracting 2SDs for product impurity results from the historical mean impurity results (e.g., impurity A: 2.0% - 2 x 0.43% = 1.14%). See also rationale under LOD.
No specific requirements.
Number of valid and invalid tests. Appropriate number of significant digits to be used for final result reporting.
As per SOP . No acceptance criteria for number of invalids and appropriate number of significant digits.
System suitability will be demonstrated by passing all acceptance criteria. System suitability criteria of the SOP may change, depending on the number Continued
I n s t i t u t e o f Va l i d a t i o n Te c h n o l o g y
Figure 6 (Continued)
AMV Design
Sample Preparation
Reported Results
Acceptance Criteria
repeated per SOP until the current criteria are met. System suitability will be evaluated by listing invalid tests. The appropriate number of significant digits in reported results will be determined following ASTM E29-02.
mance expectations should be reflected in an Acceptance Criteria System (ACS) where all acceptance criteria for the required validation parameters (as per assay classification) are meaningful, and will focus on permissible worst-case conditions. Like most concepts, the ACS has several drawbacks. One, it takes time and experience to evaluate and integrate all assay performance expectations into one system for all validation parameters, especially when validation data will be generated under QC routine testing
Figure 8
conditions. Two, systematic errors introduced during sample preparation for spiking studies (initially, small errors could also be magnified at the end of a dilution series) to determine accuracy (percent recovery) may not be accounted for when the ACS is solely developed using historical data and method capabilities. Three, when one validation parameter will fail its acceptance criteria, in general, all validation parameters will fail, leading to potential complete failure to demonstrate test system suitability. On the other hand, the opposite must then also be true, meaning that all criteria within the complete ACS will be passed when one acceptance criterion will be passed. Although ACS may only be a concept at this point, and may not be applicable for all AMVs, the potential advantages of a well-developed ACS should outweigh the drawbacks, because the ACS is solid as a system, and can easily be justified and defended. Each individual acceptance criterion is now meaningful, related to all others, and reflects the test system performance capabilities. The concept of ACS should be considered for accuracy, precision (repeatability and intermediate), assay range, LOQ, and specificity. However, deriving acceptance criteria for the linearity parameter will be difficult, since an estimation of the potential
46 I n s t i t u t e o f Va l i d a t i o n Te c h n o l o g y
worst-case combination(s) of regression line slope, yintercept, and regression coefficient becomes very complex. With a well-developed ACS, the auditors can no longer criticize acceptance criteria. Acceptance criteria are now derived as part of the ACS, which in turn, demonstrates method capabilities in respect to product specifications, historical data, and method capabilities. Furthermore, the ACS is a dynamic system that can be readily adapted as a unit to changes to the system, or for other reasons for revalidation. With experience, it will become easier and faster to set up an ACS, even for the AMV of a new test method.
Conclusion
Deriving reasonable acceptance criteria requires experience and a deep understanding of the method capabilities, product specifications, and historical data. This article provides a detailed approach to derive these criteria, which can now be justified and easily defended in an audit. The AMV can now accurately demonstrate that the test system is suitable for its intended use.
Acknowledgement
I would like to thank my colleague, Christopher Fisher, for his helpful comments and critical review of this article.
Reference
1. As per ASTM E 29-02 Section 7.4, the following instructions are given: A suggested rule relates the significant digits of the test result to the precision of the measurement expressed as the standard deviation . The applicable standard deviation is the repeatability standard deviation (see Terminology E 456). Test results should be rounded to not greater than 0.5 or not less than 0.05 , provided that this value is not greater than the unit specified in the specification (see 6.2). When only an estimate, s, is available for , s, may be used in place of in the preceding sentence. Example: A test result is calculated as 1.45729. The standard deviation of the test method is estimated to be, 0.0052. Rounded to 1.457 since this rounding unit, 0.001, is between 0.05 = 0.00026 and 0.5 = 0.0026. For the rationale for deriving this rule, refer to ASTM E 29-02. For definitions refer to ASTM E 456.
Suggested Reading
1. 2. 3. 4. 5. 6. 7. Krause, S. O., Good Analytical Method Validation Practice, Part I: Setting-Up for Compliance and Efficiency. Journal of Validation Technology. Vol. 9 No. 1. November, 2002. pp 23-32. International Conference on Harmonization (ICH), Q2A, Validation of Analytical Procedures. Federal Register. Vol. 60. 1995. ICH, Q2B, Validation of Analytical Procedures: Methodology. Federal Register. Vol. 62. 1996. United States Pharmacopoeia. USP 25 <1225>. Validation of Compendial Methods. American Society for Testing and Materials (ASTM) E 29-02. Standard Practice for Using Significant Digits in Test Data to Determine Conformance with Specifications. July, 2002. ASTM E 456 96. Standard Terminology for Relating to Quality and Statistics. September, 1996. Miller, J. C. and Miller, J. N. Statistics for Analytical Chemistry. (2nd ed.). Ellis Horwood Ltd., England. 1988.
A n a l y t i c a l M e t h o d s Va l i d a t i o n
47