Item Parceling Strategies in SEM: Investigating The Subtle Effects of Unmodeled Secondary Constructs
Item Parceling Strategies in SEM: Investigating The Subtle Effects of Unmodeled Secondary Constructs
Item Parceling Strategies in SEM: Investigating The Subtle Effects of Unmodeled Secondary Constructs
Item Parceling Strategies in SEM: Investigating the Subtle Effects of Unmodeled Secondary Constructs
ROSALIE J. HALL ANDREA F. SNELL MICHELLE SINGER FOUST
University of Akron
For theoretical and empirical reasons, researchers may combine item-level responses into aggregate item parcels to use as indicators in a structural equation modeling context. Yet the effects of specific parceling strategies on parameter estimation and model fit are not known. In Study 1, different parceling combinations meaningfully affected parameter estimates and fit indicators in two organizational data sets. Based on the concept of external consistency, the authors proposed that combining items that shared an unmodeled secondary influence into the same parcel (shared uniqueness strategy) would enhance the accuracy of parameter estimates. This proposal was supported in Study 2, using simulated data generated from a known model. When the unmodeled secondary influence was related to indicators of only one latent construct, the shared uniqueness parceling strategy resulted in more accurate parameter estimates. When indicators of both target latent constructs were contaminated, bias was present but appropriately signaled by worsened fit statistics.
Structural equation modeling (SEM), implemented with analytic tools such as LISREL (Jreskog & Srbom, 1989, 1993) or EQS (Bentler, 1989), is increasingly used to study proposed causal relationships among psychological constructs (Tremblay & Gardner, 1996). Unlike multiple-regression-based approaches to estimating structural paths, SEM techniques offer the potential to remove measurement error from estimates of structural relationships (e.g., Bollen, 1989; James, Mulaik, & Brett, 1982). This is done by separately modeling latent constructs and latent error terms so that the tests of the structural relationships can be made between the unbiased latent constructs, rather than between the observed variables that incorporate true influences of the latent construct but also systematic and measurement errors (i.e., uniquenesses). Separating the construct-relevant variance from the uniqueness requires the explicit specification of a measurement model of the relationships between latent constructs and their indicators. Typically, a latent construct has several manifest indicaOrganizational Research Methods, Vol. 2 No. 3, July 1999 233-256 1999 Sage Publications, Inc.
233
234
tors, with directional paths freed from the appropriate latent construct to relevant manifest indicator variables. In practice, the number and forms of manifest indicators may vary considerably, and there are not many firm guidelines for making these choices. However, choices about the types of indicators and specification of the measurement model have implications for the extent to which bias can be removed from the latent construct and the construct adequately represented (e.g., see Bagozzi & Edwards, 1998; DeShon, 1998). In the current article, we argue that when aggregates of items known as item parcels or testlets are used as manifest indicators, the accuracy of parameter estimates and the diagnosticity of the goodness-of-fit test may depend on how the parcels are created. Item parcels differ from subscale or scale scores in that the entire set of item parcels reflects a single primary factor dimension, or latent construct, whereas a set of subscale or scale scores reflects several separable (though generally closely related) latent constructs. The idea of creating and using item parcels is not a new oneit was originally introduced by Cattell (1956), and further explored by Cattell and Burdsal (1975). Other researchers in the fields of psychology and educational testing (e.g., Lawrence & Dorans, 1987; Manhart, 1996; Marsh, 1994; Schau, Stevens, Dauphinee, & Vecchio, 1995; Thompson & Melancon, 1996; West, Finch, & Curran, 1995) have suggested using item parcels as indicators of the latent constructs in SEM analyses to address problems with large sample size requirements, unreliability, and nonnormal or coarsely measured item-level data. However, these works have not addressed conceptual issues concerning how item parcels may or may not affect the estimation of relationships involving the latent construct. The current article reviews some previous recommendations for creating composite indicators, and suggests that under certain conditions these recommendations may influence fit statistics and lead to biased parameter estimates. Results from the analysis of two organizational data sets supporting this argument are briefly presented. A theoretical explanation for this effect is advanced, and then further explored in a set of Monte Carlo simulations.
Choosing an Indicator Structure
The choice of an indicator structure requires careful consideration both of the studys purpose and of the conceptualization of the latent constructs. Latent constructs with single indicators can be problematic, and although used with some frequency in organizational SEM models, single indicators may (a) make achieving model identification more difficult (Bollen, 1989), (b) be associated with higher likelihood of improper solutions (Ding, Velicer, & Harlow, 1995), and (c) require analyses (e.g., coefficient alpha) external to SEM if error is to be modeled and thus removed from the latent constructs. Because of these considerations, it is preferable to use at least three or four indicators per latent construct to ensure identification, increase the chances of proper solutions, and allow one to estimate latent errors (e.g., Bollen, 1989). The multiple indicators of the latent constructs may be at different levels, as suggested by Bagozzi and Edwards (1998). Their work explores the implications of different levels of aggregation on the construct validity of latent variables. Four different indicator depths are considered, reflecting increasing levels of aggregation of the indicator variables: (a) individual item responses are used as separate indicators, resulting in a total disaggregation model; (b) indicators consist of composite subsets of items
235
(indicator scores are created by aggregating item responses), resulting in a partial disaggregation model; (c) scores from preexisting facets or scales are used as indicators, resulting in a partial aggregation model; and (d) indicators consist of aggregates of scale scores, resulting in a total aggregation model. All of these aggregation depths have been employed in organizational research (Snell, Hall, & Foust, 1997). In cases where the construct of interest is broadly defined or the measurement scale has multiple subscales or subfacets, indicators at a high level of aggregation may be most appropriate. However, in many cases there is really only one good measure of the construct that one is interested in, or there are organizational constraints on the number of scales that may be included in a questionnaire. In these situations, the researcher is left with two choices: Individual scale items can be used as indicators (a total disaggregation model); or, subsets of items can be summed or averaged to form item parcels, which then serve as indicators for a partial disaggregation model. Item parcels may be preferred over individual items as indicators for a variety of reasons. The composite-level indicators tend to be more reliable and normally distributed, and to have values that are more continuously distributed. In addition, some Monte Carlo research suggests that as the number of indicators per factor increases, there are accompanying decreases in the value of a number of commonly used fit indices (Anderson & Gerbing, 1984; Ding et al., 1995; Williams & Holahan, 1994). This may occur in part because, as the number of indicators increases, there is greater potential for shared secondary influences and cross-loadings among the indicators. These sources of contamination are frequently not explicitly modeled, and thus contribute to overall lack of fit of the model. (This does not mean that the fit indices are incorrect. Rather, rules of thumb for acceptable fit often do not directly take into account the indicator/factor ratio.) Thus, many researchers opt for an indicator structure that avoids this problem by using three to four indicators per latent construct rather than a larger number of indicators. Another reason that item parcels are frequently chosen is related to sample size requirements. When a larger number of indicators per latent construct is used, the model will typically have more free parameters. Some rules of thumb for determining adequate sample size are based on the ratio of estimated parameters to respondents (e.g., see Bentler & Chou, 1987; Bollen, 1989; Tanaka, 1987). By implication, increasing the number of indicators directly affects the sample size requirements for the study. This has led to a commonly held view that the number of indicators per factor should be limited (e.g., to three or four), especially with small sample sizes. Based on the results of a recent simulation study by Marsh, Hau, Balla, and Grayson (1998), however, we suggest that small sample sizes not be the sole rationale for choosing to use item parcels. Marsh et al. found that with small sample sizes ( 100), 4 or more indicators per factor were necessary to ensure proper solutions. Furthermore, their results consistently showed that it was better to have more indicators per construct, even though higher ratios of indicators per factors resulted in lower fit indices. Higher ratios increased the likelihood of a proper solution and produced more accurate parameter estimates. Indeed, in the Marsh et al. simulations, parcels did not offer any particular advantages over the use of individual items in terms of convergence to a proper solution or accurate parameter estimation, although item parcels performed comparably to items when the number of parcels was greater than 3 and sample size was greater than 100. However, some aspects of their simulations may have been
236
uncharacteristic of typical organizational data sets (for example, all items in the simulation had equal saturation of the latent construct). In sum, there is no unqualified evidence that item parcels should be used instead of individual items, even if the number of items is very large. But given some of the common frustrations produced by large indicator/factor ratios, nonnormal data, and the sheer complexity of working with models with large numbers of indicators, researchers will continue to view item parcels as an attractive option. Given this interest, it is important to know the effects of item parceling on the accuracy of the SEM analysis. Random versus planned aggregation strategies. One important question concerns the manner in which parcels are created. When preexisting subscales or scales are used as indicators, the combination of items is by definition based on an a priori structure that has a previous theoretical and/or empirical basis. However, when item parcels are used, groupings may be formed using a random or quasi-random procedure, or groupings may be made based on a theoretical or empirical rationale developed by the researcher. For example, Schau et al. (1995) deliberately spread negatively worded items across parcels, with the goal of producing parcels that were equivalent in terms of mean, standard deviation, and skew. Lawrence and Dorans (1987) and Manhart (1996) balanced assignment of items of varying difficulty across parcels. Other anecdotal suggestions include successively combining the highest and lowest items from an exploratory factor analysis, or combining the items with the highest and lowest item-total correlations, reflecting a strategy of equalizing the influence of the primary factor across item parcels. All of these approaches, though not necessarily yielding the same combination of items into parcels, imply that some combinations are preferable to others. In contrast, recommendations to create item parcels using a random procedure imply that the choice makes no difference, or at least that there can be no rational basis for making the choice. In spite of the variety of recommendations, there is relatively little published work on the implications of choosing different grouping strategies, and no well-elaborated theoretical rationale for the choice. For example, are the fit, parameter estimates, and meaning of SEM models affected by how items are parceled? In this article, we argue that the way in which items are combined into parcels can noticeably influence the results of SEM analyses of the type of data typically collected in psychological research. And, in contrast to several of the authors mentioned above, we argue that there are advantages to placing more similar items together into the same parcel rather than balancing or distributing them equally across parcels.
Theoretical Rationale for Expecting Differences in Parceling Strategies
As a starting point, the set of items to be parceled is assumed to be unidimensional in the sense that the results of an exploratory factor analysis (EFA) would typically be interpreted as supporting a single-factor structure. That is, all items have strong loadings on a primary factor, and the eigenvalues for any additional factors are substantively lower than for the first factor, with values of less than one or a clear break in a scree plot of eigenvalues. Yet even when this condition is met, some of the items are likely to share one or more weak, secondary factor influences. These secondary factors may represent contamination from other psychological constructs, or they may be
237
Figure 1: Model Showing Items With Shared Secondary Influence Placed in Same Parcel Note. = latent exogenous construct, = factor loading, and = uniqueness.
methods factors. Because of the sensitivity of SEM analyses to even very minor model misspecifications, these secondary factors have the potential to influence indicators of model fit and to influence parameter estimation. We believe that it is these weak (but shared) secondary influences that in part determine whether items should be placed together into the same item parcel or not. The concept of external consistency, which is discussed in more detail below, offers a starting point for thinking about the implications of different item parceling strategies. External consistency of items. Gerbing and Anderson (1988) suggest that two conditions are necessary for establishing the unidimensionality of a set of indicators: internal and external consistency. Readers are likely to be more familiar with the concept of internal consistency than external consistency. Unlike internal consistency, which represents the extent to which items within a scale share common variance, external consistency evaluates the extent to which each item shares variance with items outside the scale. That is, external consistency depends on the degree to which items in the set are influenced by extraneous secondary constructs. External consistency is violated when a secondary dimension is a cause of two or more indicators in a model. SEM analyses are sensitive to both the internal and external consistency of the indicators in the model because the manifest error terms or uniqueness (s and s) include both random error and systematic error that may be due to an unmodeled external factor influence. If we translate Gerbing and Andersons (1988) work into the context of an SEM measurement model using item parcels, the implications become clearer. Compare Figures 1 and 2, which show six item-level indicators that all reflect a primary factor representing the construct of interest. In both of these figures, Items 1 and 2 also reflect
238
Figure 2: Model Showing Items With Shared Secondary Influence Placed in Different Parcels Note. = latent exogenous construct, = factor loading, and = uniqueness.
a secondary influence, such as might result when responses to the item are influenced by an additional substantive construct or by a methods factor such as negative wording or social desirability. To simplify the example, assume also that the primary and secondary constructs are unrelated to each other. In both figures, the six items have been aggregated into three 2-item parcels. Suppose that the researcher has not explicitly acknowledged the secondary factor, so that an SEM model is tested including only paths from the primary factor to the indicators. In Figure 1, the parcels are constructed so that the shared uniqueness of the two items with the secondary influence is isolated into a single parcel. Here, the estimates of the path coefficients (also referred to as s or factor loadings) from the primary factor to each item parcel will depend solely on the strength of the primary factor influence on the item parcel. The errors estimated for each parcel will reflect random error variance for all three item parcels, and for Parcel 1, will additionally incorporate the unique variance due to the influence of the secondary factor on Items 1 and 2. However, when the items with the shared secondary influence are placed into separate parcels (as shown in Figure 2), they reflect a strategy of distributing the items with shared uniqueness across parcels. In this case, the estimates of the path coefficients from the latent construct to the indicators will reflect both the influence of the primary factor and of the unmodeled secondary factor that is now common to both Parcel 1 and Parcel 2, because variance that is held in common between at least two indicators will be partially attributed to the primary factor. Resulting estimates of the relationship between the primary factor and Parcels 1 and 2 will be upwardly biased, and estimates of the relationship between the primary factor and Parcel 3 will be downwardly biased (because Parcel 3 does not share any of the secondary influence that has now been
239
absorbed into the definition of the primary factor). Thus, when the items with a shared secondary influence are placed into two or more separate item parcels, the measurement model is misspecified, resulting in inaccurate parameter estimates and a potentially poorer fit of the hypothesized model to the data. Examples from educational testing literature. Although there are no empirical investigations of the effects of an unmodeled secondary influence on parcel composition, there are a number of empirical studies from the educational testing literature that examine the issue of parceling strategies. Those familiar with this literature may recognize that our suggestion to combine items sharing unmodeled secondary influences into the same parcel may conflict with these strategies, which typically recommend distributing items with different characteristics (e.g., endorsement frequencies, negative wording, etc.) across item parcels. We suggest two reasons for this apparent discrepancy. First, the opposing strategies may emanate from attempts to deal with very different situations with regard to the unidimensionality of the items to be combined into parcels. In the current study, we focus on relatively unidimensional scales. Consider, by contrast, Kishton and Widamans (1994) study of the factor structure of a locus of control scale. A preliminary EFA suggested a three-factor structure for this scale. Two sets of item parcels were created, one in which the items identified in the EFA as belonging to the same subfacet were spread across the parcels and one in which items from each subfacet were combined together. The fits of the two models using the two different parcel sets were virtually identical; however, the authors favored the first strategy because the second strategy generated inadmissible factor intercorrelations. Although Kishton and Widamans conclusions are contrary to those advocated here, they were working with a set of items that were not unidimensional enough to argue that the indicators reflected a single primary construct. Second, much of the previous research uses item parcels to correct for violations of multivariate normality and continuous measurement assumptions that are required for the maximum likelihood and Generalized Least Squares (GLS) estimation techniques most commonly used in SEM. Here, the emphasis is on creating item parcels whose distributions show similar levels of normality (e.g., Thompson & Melancon, 1996), as well as equal standing on response factors such as item difficulty (Lawrence & Dorans, 1987; Manhart, 1996) and negative wording (Schau et al., 1995). Even though these studies demonstrate how such parcels can be constructed, they do not address whether these parceling strategies actually result in more accurate parameter estimates. The theoretical justifications are reasonable; however, we could find no empirical support that these strategies would result in more accurate parameter estimation and tests of model fit. We were most concerned with typical organizational measures; it is possible that the most appropriate parceling strategy is dependent on the characteristics of the items and types of questions being addressed, and that these vary across disciplines.
Brief Overview of Studies 1 and 2
The next sections present the results of two investigations of the effects of different combinations of items into item parcels on indicators of fit and parameter estimates. For both studies, we tested the simplest path model incorporating parcels that we could
240
devise, consisting of a causal path between two latent constructs. The exogenous construct (independent variable) always had three parcels for manifest indicators, and the endogenous construct (dependent variable) varied slightly in form across the samples. For all analyses, the six items comprising the independent variable measure were aggregated to create all 15 possible combinations of three 2-item parcels, and a model was tested for each unique combination of items into parcels, to allow us to compare the parameter estimates and fit indices across different parcel combinations. (Parcels would typically not be formed with only six items, but we wanted to keep the number of potential combinations manageable.) In Study 1, data from two organizational samples were analyzed to demonstrate that differences in fit and parameter estimates are not only hypothetically possible, but occur in practice. These data sets were chosen to represent data with characteristics typical to those seen in organizational research. The disadvantage of these real data sets, however, was that it was impossible to know the true underlying relationships. Thus, in Study 2 we turned to Monte Carlo simulations to test whether item parceling strategies that combine items with a shared secondary influence into the same parcel better recover known model parameters than do parceling strategies that separate items with a shared secondary influence.
Biodata participants and measures. The biodata sample consisted of 461 customer service representatives employed in a large utility company. Responses to a six-item biodata measure of persuasive ability were used to create indicators of the independent variable. A Likert-type response scale ranging from 1 to 5 was used, with high scores indicating high incidence of persuasive behaviors. Coefficient alpha for this scale was .65, which is consistent with previously reported reliabilities of biodata scales (Baehr & Williams, 1967; Owens, 1976). All items pertained to past behaviors and experiences (e.g., In the past, how easily have you been able to persuade others to do things your way?). The dependent variable was job performance. Performance was measured by collecting supervisor ratings of an employees ability to interact with coworkers and supervisors. The format of these ratings is similar to the behavioral summary scaling procedure (Borman, 1979), and results in item scores ranging from 1 to 6, with 6 indicating the highest level of performance. This four-item measure had a coefficient alpha of .83. Teamwork participants and measures. The second sample consisted of responses from 752 members of a large company in the communications industry. Six items were used to measure the independent variable of teamwork (e.g., Please rate your company on rewarding group effort). The dependent variable was a six-item measure of work and organizational satisfaction. Responses to both measures were made on a scale ranging from 1 (very good) to 5 (very poor). Coefficient alpha for the teamwork variable was .91, and for the satisfaction measure was .87.
241
Before creating the item parcels, a principal axis EFA was conducted on the six persuasive ability items and, separately, for the six teamwork items. Inspection of the scree plots and a parallel analysis criterion (Lautenschlager, 1989) supported the existence of a relatively strong primary factor for each set of items. For the teamwork sample, the first factor explained 69% of the variance, and the first two eigenvalues were 4.148 and 0.586. Factor loadings ranged from .78 to .82. For the biodata sample, the first factor accounted for 41% of the variance, and the first two eigenvalues were 2.45 and 1.00. Factor loadings for this sample ranged from .38 to .73. Following the EFA, for each sample a preliminary path analysis model was tested using maximum likelihood estimation, with LISREL 8.12. In this model, item-level indicators were used for the independent variable (persuasive ability or teamwork). The item-level analyses were performed for two reasons: (a) to provide fit indicators and parameter estimates that could be compared to the models using parcels as indicators, and (b) to look for evidence of unmodeled secondary factor influences. 2 The goodness-of-fit statistic was significant for both samples, thus rejecting the null hypothesis of good fit. Also, both samples had some large modification indices suggesting that fit would improve if some error terms were allowed to covary. (These were consistent with the idea that the items might incorporate small, unmodeled secondary influences.) Thus, revised models were tested in which one error covariance was freed for the biodata model and seven error covariances were freed for the teamwork model. For the biodata sample, the revised model yielded a nonsignificant value of 42.54, as well as a Comparative Fit Index (CFI) of .99 and Standardized Root Mean Square Residual (SRMR) of .033. Thus, this model fit the data extremely well. Although the of 40.54 was still significant for the teamwork sample, other indicators suggested an adequate fit (CFI = .99, SRMR = .019). The standardized item loadings for the persuasive ability indicators ranged from .41 to .63; for the teamwork indicators, these loadings ranged from .78 to .81. (All factor loadings were statistically significant.) The estimated path coefficient from persuasive ability to job performance had a standardized value of .09, and was nonsignificant. The estimated path from teamwork to satisfaction was .76 and was significant.
!! &
For the biodata sample, 2 values ranged from 13.57 to 22.71 across the 15 parcel combinations. All models had 13 degrees of freedom, and the 2 for 14 of the 15 models had probability values of greater than .05 and thus would have been judged to fit well. For the teamwork sample, the 2 values ranged from 7.50 to 40.12, with 7 degrees of freedom. Six of the models had 2 values with p greater than .05. These 6 models would have been judged to fit well on the basis of the 2 test. The remaining 9 models, however, had 2 values with ps ranging from less than .05 to less than .0001, and would have led to a conclusion of significant model misfit. The range of 2 values found with both datasets raises doubt about the assumed arbitrariness of the item parcel combination and the resulting conclusions that would
242
be drawn by the researcher. In both samples, the null hypothesis of good fit would have been rejected for one or more of the models, but not for others. In addition, there was appreciable variability in other fit indicators, in particular the Root Mean Square Error of Approximation (RMSEA). The values of the estimated structural parameters () also demonstrated some variability. In the biodata sample, this variability would have implications for the substantive conclusions drawn from the test of the modelthe path coefficients ranged from a value of .06, which is a nonsignificant relationship in this sample, to .11, which indicates a statistically significant path. The estimated path coefficients for the teamwork sample ranged from .74 to .77, and all were statistically significant. These results demonstrate that different combinations of items into parcels can produce different fit statistics and parameter estimates. To eliminate the possibility that the fluctuations in model fit were related to differences in the normality of the distributions of the parcels that were created, the skew and kurtosis of all parcels were compared. Differences in the distributions of the item parcels had no discernable connection to the fit of the different models. In addition, the modification indices for the uniqueness of the item parcels were examined to determine if they could be used to predict which item combinations were especially likely to contribute to model misfit. However, this led to no clear-cut conclusions, potentially because of the simultaneous presence of more than one secondary influence. Thus, we turned to the analysis of data produced by Monte Carlo simulations, where the true underlying measurement model could be known.
Study 2: Method
To test the feasibility of the secondary construct explanation developed in the introduction, the Monte Carlo capabilities of PRELIS 2/LISREL 8 were used for three simulations, each of which generated 500 simulated data sets. Each of the simulated data sets contained 500 observations on 10 variables. Of the 10 variables, 6 were created to simulate item-level indicators (X1 to X6) of the latent independent (exogenous) variable and were then paired to make all possible different combinations of three 2-item parcels, in the same manner as for Study 1. The remaining 4 variables were indicators of the latent dependent (endogenous) variable. A covariance matrix containing all possible item parcel combinations plus the dependent variable indicators was output for each of the simulated data sets, and then used as input to a LISREL model proposing a directional path between two latent constructs, 1 (the exogeneous, or independent latent variable) and 1 (the endogenous, or dependent latent variable), again in a manner very similar to those used for the biodata and team samples. Thus, for each of the three simulations, a total of 7,500 (15 possible combinations of parcels 500 data sets) LISREL analyses were performed. The appendix contains the PRELIS program lines used to generate the simulated data sets, based on procedures outlined in Jreskog and Srbom (1994, pp. 14-16). The 500 data sets of Simulation 1 were generated from a very simple model in which all six item-level indicators of the primary exogenous construct (1) had loadings of .65, plus a random error component. Simulation 1 included no variables with secondary construct influences. In Simulation 2, all six indicators of the exogenous construct (1) had unstandardized loadings of .65 on the primary construct, and two of the indicators (X1
243
and X2) had additional loadings of .50 and .40 on a secondary construct (2). In all data sets, the value of the path from the primary exogenous construct (1) to the endogenous construct (1) was set to .60 (random error was also added in the modeling of this path). In Simulation 3, both a primary and secondary exogenous construct influenced the indicators of the independent variable, exactly as in Simulation 2. However, in this model, the secondary exogenous construct (2) additionally had a weak influence (.20) on the endogenous construct (1). This second latent construct could represent either a substantive factor or a methods factor that influences the observed value of two of the manifest X variables in combination with 1. To summarize the results of the simulations, the fit indicators and parameter estimates were averaged across the 500 data sets tested for each of the 15-item parcel combinations. Given that we were testing a 13 df model, the expected value of 2 for a perfectly fitted model would be equal to 13, with p = .50. The value of CFI should be close to 1.00, and small values should be observed for RMSEA and SRMR. For Simulations 2 and 3, mean fit indicators and parameter estimates were also determined for all models using the same general item parceling strategy (i.e., isolated vs. distributed uniqueness), and significance tests were conducted to determine if the observed differences were statistically significant. Because of the effects of the unmodeled secondary construct in the Simulation 2 and 3 data sets, we expected to see more variability in the LISREL indicators across all possible item parcel sets for these two simulations than for Simulation 1. In other words, the models for Simulations 2 and 3 were misspecified because they did not explicitly model the effects of the secondary construct. We expected that item parcel combinations which placed the two items sharing the unmodeled secondary influence into the same parcel (thus isolating this undesired shared variance into the uniqueness term for the indicator) would show better model fit and more accurate parameter estimates than combinations which separated the two items into different parcels. Simulation 3 should demonstrate how fit and parameter estimates are affected when the unmodeled secondary construct not only influences indicators of the independent variable but is related to the dependent variable, further complicating the effects on fit indices and parameter estimates.
Results
Simulation 1
For Simulation 1 data, mean fit indicator and parameter estimate values were calculated over all 500 data sets and for all possible 15-item parcel combinations, and thus were based on 7,500 LISREL analyses. Because no secondary influence was included in the generation of the data, the tests of the LISREL models should recover the true values of the parameter estimates and should fit well. Indeed, all fit indicators sug2 gested a close-to-perfect fit of the model to the data. The goodness-of-fit values had a mean of 13.43 (range = 13.15 to 13.81), df = 13, average p = .48. The values of RMSEA ranged from .011 to .012, and the CFI was equal to 1.00 for all models. The mean values of the parameter estimates for Simulation 1 (reported in Figure 3) provide a standard against which to evaluate the values recovered from Simulations 2 and 3. These values were exactly what would be expected given knowledge of the
244
Figure 3: Mean Unstandardized Parameter Estimates for Simulation 1 (no secondary influence) Note. = latent exogenous construct, and = latent endogenous construct.
parameters that generated the data sets. On the exogenous variable side, each item parcel had a loading (x) of 1.30that is, twice (because there were two items per parcel) the simulation value of .65. Similarly, each error term for the item parcel was .167twice the variance of the uniform random variable specified as the error term for each item in the simulation. The estimate of the path from the exogenous to the endogenous variable was .42. This value is equal to the simulation value of .60, standardized by dividing by the product of the standard deviations of the exogenous (1.00) and endogenous (0.70) latent constructs.
Simulation 2
Isolated uniqueness strategy. In Simulation 2, there were two different strategies for creating item parcels. The isolated uniqueness strategy combined Items 1 and 2, which shared the secondary influence, into the same item parcel (Parcel 1). The isolated uniqueness strategy was used in three of the 15-item parcel combinations tested, and the summarized results for this strategy are thus based on 1,500 LISREL analyses. The 2 values for the isolated uniqueness models showed an excellent fit, with a mean of 12.69 (range = 12.26 to 12.73, df = 13, average p = .52). RMSEA had a mean of .010, and ranged from 0 to .067. The mean CFI was .99, with values ranging from .99 to 1.00. The mean parameter estimate for the path coefficient relating the two latent constructs was correct, as were the estimates of the loadings for the indicators of the exogeneous construct (1) and the disturbance term for the endogeneous construct () (see Figure 4). Not surprisingly, the error variance for Parcel 1, which contained the 2 items sharing the secondary influence, was larger than the error variances of the other two item parcels. This larger value is because the errors for Parcels 2 and 3 include only
245
Figure 4: Mean Unstandardized Parameter Estimates for Simulation 2: Models With Unmodelled, Shared, Secondary Influence Isolated in Parcel 1 Note. = latent exogenous construct, and = latent endogenous construct.
random error, whereas the error term for Parcel 1 reflects both random error and the unique variance due to the unmodeled secondary influence. Distributed uniqueness strategy. In contrast, the Simulation 2 models that placed the items with a shared secondary influence into separate item parcels did not fit as well. This distributed uniqueness strategy was used for 12 of the 15-item parcel combinations tested; therefore, the reported means are based on 6,000 LISREL analyses. For these models, the mean 2 was 20.29, with values that ranged from 19.92 to 20.71. The difference between the mean 2 values for the two different parceling strategies used in Simulation 2 was statistically significant, t = 47.18, p < .001. The mean RMSEA for these models was .030, and ranged from 0 to .090. The average CFI was 1.00, and values ranged from .96 to 1.00. Not only did indices of model fit differ for the two parceling strategies, but several of the mean parameter estimates from the Simulation 2 distributed uniqueness models were biased (see Figure 5). The pattern of changes in the parameter estimates suggested that the exogenous construct had been redefined so that it now partly reflected the secondary influence. This is corroborated by the upwardly biased loadings of the two-item parcels influenced by the secondary constructParcels 1 and 2 had loadings of 1.38 versus the true value of 1.30and the downwardly biased loading for Parcel 3 (1.23 vs. the true value of 1.3), which was not influenced by the secondary construct. Because the exogenous construct (1 in Figure 5) has been redefined, the estimate of the relationship between it and the endogeneous variable (1) is attenuated from a true value of .42 to a value of .40. This occurs because the unmodeled secondary construct that now contaminates the definition of the primary construct is not related to the dependent variable.
246
Figure 5: Mean Unstandardized Parameter Estimates for Simulation 2: Models With Unmodeled, Shared, Secondary Influence in Separate Parcels. Parcel 1 Has Stronger Secondary Influence and Parcel 2 Has Weaker Secondary Influence Note. = latent exogenous construct, and = latent endogenous construct.
Simulation 3
Isolated uniqueness strategy. Finally, Figures 6 and 7 present the results of using the two-item parceling strategies on Simulation 3 data. Recall that the true model underlying the data from Simulation 3 was identical to that of Simulation 2, except that the secondary construct now had a causal influence not only on two items reflecting the exogenous construct but also directly on the endogenous construct. The results for the isolated uniqueness strategy suggest increased model misfit and the mean 2 was 26.91, with values ranging from 26.77 to 27.11. The mean probability value associated with the 2 test was .08, and importantly, 65% of the p values were less than or equal to .05 and would have led to a correct rejection of this misspecified model. The mean RMSEA equaled .043 for all relevant models, with values ranging from 0 to .11. As can be seen in Figure 6, the loading for Parcel 1, which contains the shared secondary influence, is upwardly biased (1.57 vs. a true value of 1.30), although the loadings for the other two parcels are correct. The estimate of the path coefficient relating the exogenous to the endogenous latent construct is also upwardly biased with a value of .47, opposed to the true value of .42. This occurs because the estimate of this path now confounds the true relationship of these two constructs with the additional relationship between the secondary influence and the dependent construct. Distributed uniqueness strategy. For Simulation 3, when the distributed uniqueness strategy was used, the 2 values improved significantly over the just reported Simulation 3 results for the isolated uniqueness strategy, t = 52.02, p < .001. The mean value of 2 for the distributed uniqueness strategy was 13.63 and ranged from 13.40 to
247
Figure 6: Mean Parameter Estimates for Simulation 3: Models With Unmodeled, Shared, Secondary Influence Isolated in Parcel 1; Secondary Influence Affects Endogenous Variable Note. = latent exogenous construct, and = latent endogenous construct.
Figure 7: Mean Parameter Estimates for Simulation 3: Models With Unmodelled, Shared, Secondary Influence in Parcels 1 and 2; Secondary Influence Affects Endogenous Variable Note. = latent exogenous construct, and = latent endogenous construct.
13.88, suggesting an excellent fit. The mean probability value associated with the 2 test was .47, and for all of the data sets the probabilities were greater than .05. We emphasize this point because it means that based on the probability level associated
248
with the 2 goodness-of-fit value, which is considered to be a stringent criterion, the incorrectly specified model would not have been rejected in any of the analyses where the distributed uniqueness strategy was used. The mean RMSEA value was .012, with values ranging from 0 to .097. In the set of models using the distributed uniqueness strategy, all three of the paths from the latent exogenous construct to the item parcels were biased (1.52, 1.28, and 1.24, compared to a true value of 1.30), in contrast to the isolated uniqueness strategy models in which bias was observed in only one of these paths. The path coefficient was slightly more upwardly biased than in the isolated uniqueness models of Simulation 3, with a mean value of .48 rather than the true value of .42 (see Figure 7), again because of the confound with the unmodeled secondary construct. Thus, in Simulation 3, both item parceling strategies led to biased estimates of the path coefficient, because the model being tested did not explicitly include the effects of the secondary construct. However, in Simulation 3, when the distributed uniqueness strategy was used, the mis2 fit was not readily detected because the values of and RMSEA suggested an excel2 lent fit of the model to the data. In contrast, the larger and RMSEA of the models using the isolated uniqueness strategy provided better clues that the model was indeed misspecified.
Discussion
As described in the introduction, there are a variety of theoretical and practical reasons for researchers to consider using item parcels, including keeping the ratio of manifest indicators to latent constructs manageable, reducing the number of free parameters in the model to decrease sample size requirements, and increasing the chances of adequate model fit. Most important among the reasons, however, is the issue of good construct representation, as raised by Bagozzi and Edwards (1998). Their empirical data suggested that in comparison to more aggregate indicators, item parcels and item-level indicators provided the best representation of the multifaceted scale that they studied (the Work Aspect Preference Scale). That an appreciable number of other researchers also choose to use item parcels is supported by the literature. We reviewed all articles published in the Journal of Applied Psychology between 1990 and 1996, and found that 17% of the 48 articles that used SEM analyses to test causal models relied on item parcels as manifest indicators for at least some latent constructs in the model (details available from authors). Bagozzi and Edwards (1998) proposed four general guidelines for creating item parcels in their discussion section, including rules that sets of items be unidimensional and that items combined into one parcel should be at the same level of specificity and constitute independent observations from items in another parcel. They contend that these guidelines will make measurement more precise and, most important, reduce the chance of including other constructs such as subordinate factors (i.e., secondary influences), superordinate aspects (i.e., higher level factor or method effects), or covarying variables. We believe our study results augment their list, and provide some explanations and boundary conditions for their guidelines.
249
Demonstration that parceling strategy makes a difference. The analyses of both the empirical data sets (biodata and teamwork data) and the simulated data sets clearly demonstrate that the particular combination of items into parcels can systematically influence fit statistics and parameter estimates. Tests of the difference between the mean 2 values for the isolated uniqueness strategy versus the distributed uniqueness strategy were statistically significant for both Simulation 2 and Simulation 3 of Study 2 (Simulation 1 items had no secondary influence). In the teamwork sample of Study 1, 40% of the item parcel combinations resulted in a nonsignificant 2 value, whereas 60% did not. In the biodata sample of Study 1, estimates of the relationship between the independent and dependent constructs were statistically significant for 27% of the models and not for 73% of the models. And perhaps more important, the simulation results clearly showed that when a secondary influence was present, parameter estimates of models using parcels could be biased. Taken together, these findings demonstrate the fallacy of assuming that a strategy of randomly generating item parcels will produce trivial differences in SEM results. Enhanced theoretical explanation of observed differences across item parcel combinations. The Study 1 analyses suggested that unmodeled secondary influences might explain the observed differences in SEM results across different item parcel combinations. To verify this impression, three sets of simulated data sets were created for Study 2, which demonstrated that the choice of item parceling strategy was essentially arbitrary when no secondary influences were present, as shown in Simulation 1. When secondary influences are present, as shown in Simulations 2 and 3, two clearly differentiated patterns of results emergedone pattern was characteristic of combining items sharing the secondary influence into a single parcel (the isolated uniqueness strategy), and the other pattern was characteristic of combining items sharing the secondary influence into different parcels (the distributed uniqueness strategy). When a distributed uniqueness strategy was used in Simulations 2 and 3, the secondary factor was partially subsumed in the representation of the primary latent construct. This confound biased both estimates of the factor loadings and of the path coefficient for the causal relationship between the primary exogenous construct and the endogenous construct. In Simulation 2, where the unmodeled secondary factor was related to the primary exogenous construct but not to the endogenous construct, the estimate of the path coefficient was attenuated. In Simulation 3, where the secondary factor was related to both the exogenous construct and the endogenous construct, the estimate of the path coefficient was inflated. In contrast, when an isolated uniqueness strategy was used in Simulation 2, the factor loadings and path coefficient were correctly estimated. In Simulation 3, the isolated uniqueness strategy did not protect against biased estimates. However, when the iso2 lated uniqueness strategy was used in Simulation 3, the test correctly gave indications of model misspecification, whereas the distributed uniqueness strategy did not. The simulation results were consistent with theoretically derived expectations based on the measurement concept of external consistency. When items with a shared
250
secondary influence are placed into the same item parcel, the external consistency of the measure is increased by relegating a secondary influence to the uniqueness term of a single item parcel, so that no two parcels are both influenced by the same secondary construct. SEM is sensitive to the external consistency of a measure, and provides a more stringent assessment of its unidimensionality (Gerbing & Anderson, 1988). The isolated uniqueness strategy for creating item parcels can increase the unidimensionality of the latent construct by improving the external consistency of the manifest indicators.
Boundary Conditions for Generalizing Results
Especially because these results are based on a relatively small number of variables and parcel combinations, as well as a limited range of parameter values, we are reluctant at this point to generalize widely. The extent to which these results are likely to apply to other situations may depend on several boundary conditions, including the following: (a) the extent that the item set is unidimensional, (b) whether the secondary influence affects indicators for only one (versus more than one) construct, and (c) the number of unmodeled secondary influences present. These are elaborated in the following sections. Unidimensionality of the item set. Our study results may apply only to situations where the set of items to be combined into parcels is unidimensional. Although parceling effects occur because one or more quite weak secondary constructs are present, an EFA of the items used in our samples and simulations would demonstrate that all items show a substantial influence of the primary construct, and any secondary factors are weak enough that they would ordinarily be disregarded. In the simulations, the eigenvalue for the primary factor was six times larger than the eigenvalue for the secondary factor. Thus, the simulation results demonstrate meaningful effects with weak secondary factors. This ratio may be typical for some types of organizational measures (e.g., organizational climate, job satisfaction, commitment); however, researchers using other types of measures (e.g., personality) may not have such strong unidimensional structures. In fact, the two Study 1 data sets exemplify this discrepancythe ratio of first to second eigenvalues for the teamwork sample was 7 to 1; for the biodata sample, it was 2.45 to 1. Recently, Snell, Hall, Davies, and Keeney (1999) examined several empirical data sets with differing levels of secondary factor contamination in an effort to evaluate the robustness of the isolated uniqueness strategy. In general, they found that the various combinations of items into item parcels yielded relatively consistent parameter estimates when the ratio of the eigenvalues for the first factor to the second factor was above 2.5. However, when the primary factor was only twice as large as the secondary factor, the primary factor saturation of the items fluctuated considerably and the various item parcel combinations produced more inconsistent parameter estimates. These results suggest that the isolated uniqueness strategy may be most effective when the secondary factor can be easily identified and it only weakly influences the items defining the primary construct. When the unidimensionality condition is not met, the isolated uniqueness strategy may not be optimal. Violation of this condition may be one explanation for why some authors in the educational testing literature (e.g., Kishton & Widaman, 1994;
251
Lawrence & Dorans, 1987; Manhart, 1996; Schau et al., 1995; Thompson & Melancon, 1996) recommend a strategy of placing similar items into different parcels, a variant of what we have called the distributed uniqueness strategy. It may be that when the set of items is unidimensional, the isolated uniqueness strategy is best, but when they are multidimensional, a parallel strategy is best. However, it may also be that the better fit achieved with a parallel strategy in this circumstance masks serious measurement problems that would be better addressed by respecifying the model. We did not address this issue in the current study. The secondary influence affects indicators of only one construct. In Simulation 2, the isolated uniqueness strategy resulted in correct parameter estimates and better model fits, whereas the distributed uniqueness strategy led to some biased estimates and poorer overall model fit. However, this was not true in Simulation 3, where both strategies resulted in biased estimates of some parameters and the isolated uniqueness strategy models showed poorer overall model fit. The difference between these two simulations was that in Simulation 2, the influence of the secondary factor was restricted only to the items that served as indicators of the primary endogenous construct. In Simulation 3, the secondary factor influenced both the indicators of the primary exogenous construct and the endogenous construct. Thus, we conclude that the isolated uniqueness strategy will not protect against biased estimates when the secondary factor influences more than one construct. However, the results of Simulation 3 clearly demonstrate the potential danger of selecting a set of item parcels because it maximizes model fit. In this simulation, the models using the isolated uniqueness strategy showed poorer fit than did the distributed uniqueness models. The poorer fit of the isolated uniqueness models correctly flagged an unmeasured variables problem (e.g., see James, 1980). The isolated uniqueness models forced the influence of the unmodeled factor into a single indicator, resulting in a substantial uniqueness term that covaried with the endogenous variable. Because the model did not explicitly include a path or covariance relating the uniqueness to the endogenous construct, the fit of the model suffered. In contrast, when the distributed uniqueness strategy was used, variance related to the secondary construct was interpreted as shared variance and to some extent was incorporated into the primary construct. This inflated the estimated causal relationship between the exogenous and endogenous variables, and resulted in an apparently better fit of the (incorrect) model to the data because now some of the relationship between the secondary construct and the endogenous construct was accommodated in the model, albeit in a misleading manner. Although the model appeared to fit better, the assumption that all relevant causes of the endogenous variable were included in the model was still violated. Thus, the models employing the isolated uniqueness strategy were more desirable because they correctly demonstrated poor model fit and served to alert the researcher to a model specification problem. Number of secondary influences. The simulations performed in this study were probably oversimplified in the sense that responses to items may well be influenced by more than one secondary construct. The resulting effects on model fit and parameter estimates become correspondingly more difficult to anticipate, and trade-offs may have to be made if a particular item shares multiple secondary influences with other items. Although the present study did not examine such complex relationships, we are hopeful that future studies can address issues such as the following: Can multiple sec-
252
ondary constructs cancel out their effects on model fit and parameter estimation? What, if any, is the boundary between an irrelevant secondary influence and a relevant (i.e., misleading or damaging) one?
Tentative Recommendations for Practice
Given the argument in Multivariate Behavioral Research concerning the use of composite scores versus individual items in measurement invariance studies (Drasgow, 1995; Labouvie & Ruetsch, 1995; McDonald, 1995; Nesselroade, 1995; Widaman, 1995), it is clear that SEM researchers do not believe that the choice of an indicator structure is arbitrary. However, we were surprised by the paucity of research and resulting practical guidelines for those wishing to conduct SEM analyses using item parcels. The results of the current study support combining items that share a secondary factor into the same parcel, to force this unmodeled influence into the uniqueness term and thereby isolate the factor loading and structural estimates as much as possible from the contamination of the secondary factor influence. However, one is left with the question of how to detect the possible presence of unmodeled, secondary factors. At this point we can only make tentative recommendations. First, a rational analysis of the item content might provide some clues about the nature of potential secondary factors. The content of the items defining the primary construct may be scrutinized with the goal of creating smaller subscales based on recognizable content differences or context differences (e.g., items concerning ones behavior with friends vs. with coworkers). There are also a variety of method effects that could constitute a substantial secondary factor such as positive versus negative item wording, sequential placement in the questionnaire (e.g., next to items measuring a different construct), or more pervasive response tendencies such as negative affectivity or socially desirable responding. Second, a factor analysis of the item-level data may be helpful. Either an EFA or a confirmatory factor analysis (CFA) could be used for this. With an EFA, the researcher would want to factor analyze all items from a scale, forcing from two to four factors, depending on the total number of items and the likely number of parcels to be created. Items with higher loadings on the same factor should be combined into the same parcel. (Or, another strategy altogether would be to avoid parcels entirely by using as indicators only the items with high loadings on the primary factor and no cross-loadings. This might be best if the factor analysis results are complex.) If CFA is used, then the modification indices may be used to determine which items have uniquenesses that covary. These items would then be combined into the same parcel. Following this, the researcher might also want to carefully inspect the modification indices for the parceled indicators to determine if the parceling strategy has indeed isolated the strongest secondary influences. Third, one might merely want to see if there is likely to be meaningful variability in results depending on how items are combined into parcels. In this case, one could try testing a number of models, using different item parcel combinations, and then comparing the differences in fit indices and parameter estimates. If there are many items, trying all possible combinations of parcels can get quite tedious; instead, a handful of combinations that differed substantially from each other in composition could be tested.
253
Once potential secondary constructs have been identified, a couple of strategies might be pursued. First, if the researcher is lucky enough to have a measure of the unmodeled secondary factor, new variables could be added to the model and explicit paths included to the relevant items. For example, if the secondary factor is a methods factor, one might employ method bias analyses similar to those detailed in Williams and Anderson (1994). In this study, the effects of positive and negative affectivity were modeled as additional causes of the indicators of substantive constructs. Although cumbersome, SEM analyses explicitly modeling multiple causes for the indicators help mitigate the problem of relevant unmodeled variables. However, it may not always be possible to determine the exact nature of the secondary influence, or even if it is known, a good measure of it might not have been collected. In this situation, our isolated uniqueness strategy might prove quite helpful. In addition to recommendations concerning how to create item parcels and identify potentially damaging secondary influences, results of the current study also suggest that alternative interpretations of large error terms ( or ) should be considered when interpreting the SEM output from a partial disaggregation model. Concern with a uniqueness term for one composite indicator that is substantially larger than the others may tempt one to instead create and use a set of item parcels that generate more uniform uniqueness estimates. However, results from the current study suggest that this large uniqueness may be the key to successfully removing the unmodeled secondary construct from the estimation of the primary latent trait.
Conclusions
The results of this study show that different combinations of items into parcels can affect SEM analysis results in the following three important ways: (a) biasing parameter estimates, (b) creating variability in fit indicators, and (c) redefining the nature of the primary construct. A strategy of combining items that share an unmodeled secondary influence into the same parcel may help mitigate these effects. To the extent that users of SEM are aware that the choice of a parceling strategy may introduce bias, the quality of their analyses will be improved.
254
APPENDIX PRELIS Program Lines for Simulations 1 to 3 TITLE Generate item level data for item parcel study DA NO = 500 RP = 500 CO ALL ; !Create latent constructs and relations among them NE KSI1 = NRAND; NE KSI2 = NRAND; NE KSI3 = .60 * KSI1 + NRAND {Simulation 3: NE KSI3 = .60 * KSI1 + .20 * KSI2 + NRAND} ; !Generate random errors for manifest indicators NE DELTA1 = URAND; NE DELTA2 = URAND; NE DELTA3 = URAND NE DELTA4 = URAND; NE DELTA5 = URAND; NE DELTA6 = URAND NE DELTA7 = URAND; NE DELTA8 = URAND; NE DELTA9 = URAND NE DELTA10 = URAND ; !Generate item-level values for indicators of IV (X1 X6) and DV (X7 X10) NE X1 = .65 * KSI1 + .00 * KSI2 + DELTA1 {Simulation 2,3: NE X1 = .65 * KSI1 + .50 * KSI2 + DELTA1} NE X2 = .65 * KSI1 + .00 * KSI2 + DELTA2 {Simulation 2,3 NE X2 = .65 * KSI1 + .40 * KSI2 + DELTA1} NE X3 = .65 * KSI1 + .00 * KSI2 + DELTA3 NE X4 = .65 * KSI1 + .00 * KSI2 + DELTA4 NE X5 = .65 * KSI1 + .00 * KSI2 + DELTA5 NE X6 = .65 * KSI1 + .00 * KSI2 + DELTA6 NE X7 = .70 * KSI3 + DELTA7 NE X8 = .70 * KSI3 + DELTA8 NE X9 = .70 * KSI3 + DELTA9 NE X10 = .70 * KSI3 + DELTA10 ; !Create item parcels NE TEST12 = X1 + X2; NE TEST13 = X1 + X3; NE TEST14 = X1 + X4 NE TEST15 = X1 + X5; NE TEST16 = X1 + X6; NE TEST23 = X2 + X3 NE TEST24 = X2 + X4; NE TEST25 = X2 + X5; NE TEST26 = X2 + X6 NE TEST34 = X3 + X4; NE TEST35 = X3 + X5; NE TEST36 = X3 + X6 NE TEST45 = X4 + X5; NE TEST46 = X4 + X6; NE TEST56 = X5 + X6 ; SD KSI1 KSI3 DELTA1 DELTA10 X1 X6 OU CM = {filename}
References
Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155-172. Baehr, M. E., & Williams, G. B. (1967). Underlying dimensions of personal background data and their relationship to occupational classification. Journal of Applied Psychology, 51, 481-490. Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in organizational research. Organizational Research Methods, 1, 45-87.
255
Bentler, P. M. (1989). EQS structural equations program manual. Los Angeles: BMDP Statistical Software. Bentler, P. M., & Chou, C. P. (1987). Practical issues in structural modeling. Sociological Methods and Research, 16, 78-117. Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley. Borman, W. C. (1979). Format and training effects on rating accuracy and rater errors. Journal of Applied Psychology, 64, 410-421. Cattell, R. B. (1956). Validation and intensification of the sixteen personality factors questionnaire. Journal of Clinical Psychology, 12, 205-214. Cattell, R. B., & Burdsal, C. A., Jr. (1975). The radial parcel double factoring design: A solution to the item-vs-parcel controversy. Multivariate Behavioral Research, 10, 165-179. DeShon, R. P. (1998). A cautionary note on measurement error corrections in structural equation models. Psychological Methods, 3, 412-423. Ding, L., Velicer, W. F., & Harlow, L. L. (1995). Effects of estimation methods, number of indicators per factor, and improper solution on structural equation modeling fit indices. Structural Equation Modeling, 2, 119-144. Drasgow, F. (1995). Some comments on Labouvie and Ruetsch. Multivariate Behavioral Research, 30, 83-85. Gerbing, D. W., & Anderson, J. C. (1988). An updated paradigm for scale development incorporating unidimensionality and its assessment. Journal of Marketing Research, 25, 186-192. James, L. R. (1980). The unmeasured variables problem in path analysis. Journal of Applied Psychology, 65, 415-421. James, L. R., Mulaik, S. A., & Brett, J. M. (1982). Causal analysis: Assumptions, models, and data. Beverly Hills, CA: Sage. Jreskog, K. G., & Srbom, D. (1989). LISREL 7: A guide to the program and applications. Chicago: Scientific Software International. Jreskog, K. G., & Srbom, D. (1993). New features in LISREL 8. Chicago: Scientific Software International. Jreskog, K. G., & Srbom, D. (1994). Simulation with PRELIS 2 and LISREL 8. Chicago: Scientific Software International. Kishton, J. M., & Widaman, K. F. (1994). Unidimensional versus domain representative parceling of questionnaire items: An empirical example. Educational and Psychological Measurement, 54, 757-765. Labouvie, E., & Ruetsch, C. (1995). Testing for equivalence of measurement scales: Simple structure and metric invariance reconsidered. Multivariate Behavioral Research, 30, 63-76. Lautenschlager, G. J. (1989). A comparison of alternatives to conducting Monte Carlo analyses for determining parallel analysis criteria. Multivariate Behavioral Research, 24, 365-395. Lawrence, I. M., & Dorans, N. J. (1987, April). An assessment of the dimensionality of SATMathematical. Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, DC. Manhart, J. J. (1996, April). Factor analytic methods for determining whether multiple-choice and constructed-response tests measure the same construct. Paper presented at the annual meeting of the National Council on Measurement in Education, New York. Marsh, H. W. (1994). Confirmatory factor analysis models of factorial invariance: A multifaceted approach. Structural Equation Modeling, 1, 5-34. Marsh, H. W., Hau, K. T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral Research, 33, 181-220. McDonald, R. P. (1995). Testing for equivalence of measurement scales: A comment. Multivariate Behavioral Research, 30, 87-88.
256
Nesselroade, J. R. (1995). . . . and expectation fainted, longing for what it had not.: Comments on Labouvie and Ruetschs Testing for Equivalence . . . Multivariate Behavioral Research, 30, 95-99. Owens, W. A. (1976). Background data. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 609-644). Chicago: Rand McNally. Schau, C., Stevens, J., Dauphinee, T. L., & Vecchio, A. D. (1995). The development and validation of the survey of attitudes toward statistics. Educational and Psychological Measurement, 55, 868-875. Snell, A. F., Hall, R. J., Davies, G. M., & Keeney, M. J. (1999, April). The implications of secondary factors for the use of item parcels in Structural Equation Modeling. Paper presented at the 14th annual meeting of the Society of Industrial and Organizational Psychology, Atlanta, GA. Snell, A. F., Hall, R. J., & Foust, M. S. (1997, August). Are testlets created equal: Examining testlet construction strategies in SEM. Paper presentation at annual meeting of the Academy of Management, Boston. Tanaka, J. S. (1987). How big is enough?: Sample size and goodness of fit in structural equation models with latent variables. Child Development, 58, 134-146. Thompson, B., & Melancon, J. G. (1996, November). Using item testlets/parcels in confirmatory factor analysis: An example using the PPSDQ-78. Paper presented at the annual meeting of the Mid-South Educational Research Association, Tuscaloosa, AL. Tremblay, P. F., & Gardner, R. C. (1996). On the growth of structural equation modeling in psychological journals. Structural Equation Modeling, 3, 93-104. West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage. Widaman, K. F. (1995). On methods for comparing apples and oranges. Multivariate Behavioral Research, 30, 101-106. Williams, L. J., & Anderson, S. E. (1994). An alternative approach to method effects by using latent-variable models: Applications in organizational behavior research. Journal of Applied Psychology, 79, 323-331. Williams, L. J., & Holahan, P. J. (1994). Parsimony-based fit indices for multiple-indicator models: Do they work? Structural Equation Modeling, 1, 161-189.
Rosalie J. Hall is an assistant professor in the Department of Psychology at the University of Akron. She received her Ph.D. in Industrial/Organizational Psychology from the University of Maryland. Her current research interests include interpersonal perception in organizational settings and applied research methods. Andrea F. Snell is an assistant professor in the Department of Psychology at the University of Akron. She received her Ph.D. in Human Differences and Measurement Psychology from the University of Georgia. Her research interests include the development of noncognitive selection measures and the assessment of response distortion effects. Michelle Singer Foust is a doctoral candidate in Industrial/Organizational Psychology at the University of Akron. Her research interests include employee lateness behaviors and advanced statistical methodologies.