Best Practices in the Use of Bifactor Models : Conceptual Grounds , Fit Indices and Complementary Indicators

Bifactor models have gained increasing popularity in the literature concerned with personality, psychopathology and assessment. Empirical studies using bifactor analysis generally judge the estimated model using SEM model fit indices, which may lead to erroneous interpretations and conclusions. To address this problem, several researchers have proposed multiple criteria to assess bifactor models, such as a) conceptual grounds, b) overall model fit indices, and c) specific bifactor model indicators. In this article, we provide a brief summary of these criteria. An example using data gathered from a recently published research article is also provided to show how taking into account all criteria, rather than solely SEM model fit indices, may prevent researchers from drawing wrong conclusions.


Introduction
Despite the increasing use of bifactor models in psychology and related sciences, several researchers have noticed common errors in the bifactor analyses performed, particularly in the interpretation of model fit based solely in SEM (structural equation model) model fit indices (Gignac, 2016;Rodriguez, Reise, & Haviland, 2016).Based on literature review, the aim of the present article is to recommend multiple criteria when assessing bifactor models, including a

Bifactor Model: Basic Issues
Within the framework of confirmatory factor analysis, albeit not exclusive to this approach, bifactor models (BM) represent one of the options available to the applied researcher for simultaneously testing the extent to which a particular set of items are explained by a general underlying factor and group-level factors.By doing so, BM constitutes a means to determine whether the construct being measured by a scale can be viewed primarily as either unidimensional or multidimensional.Bifactor models should be distinguished from hierarchical models with higher-order factors.Particularly, because in the latter the group-level factors represent dimensions of a general factor, whereas in BM, group-level factors are hypothe-sized to be independent (i.e.orthogonal) and not lower-order factors of the general factor (see Figure 1).Accordingly, BM is especially useful for assessing the validity of an instrument intended to measure both the overall construct and its specific dimensions (Rodriguez et al., 2016).
Given that BM tends to outperform conventional confirmatory factorial analysis (CFA) models simply because of the way in which these are specified (Gignac, 2016;Morgan, Hodge, Wells, & Watkins, 2015), when a particular BM yields adequate model fit indices it is highly recommended to use complementary statistical indices for a more accurate interpretation.These additional, BM-specific fit indices are the Explained Common Variance (ECV), the Percentage of Uncontaminated Correlations (PUC), the hierarchical Omega coefficients (omega ω, omega subscale ω s , omega hierarchical ω h , and omega hierarchical subscale ω hs ) and the H coefficient (Rodriguez et al., 2016).
The ECV represents the proportion of the common variance attributable to the general factor.On the other hand, the ω h reflects the proportion of the total variance explained by the general factor, while the ω hs reflects the proportion of the total variance accounted for the specific factors after controlling for the influence of the general factor.High values on ECV (> .60)and ω h (> .70)indicate that the variance of the indicators is substantially accounted for by the general factor and, therefore, the tenability of specific factors would be forced.Nonetheless, some authors (Bonifay, Lane, & Reise, 2017) have also pointed that if ωhs explains a non-redundant amount of variance (> .30),then the specific factors could be retained along with the general factor, albeit considering the theoretical underpinnings of such decision.Also, there is an application of ECV at item level (ECV-I, Stucky, Thissen, & Edelen, 2013) with a similar interpretation to ECV .
The PUC indicates the percentage of correlations not contaminated by multidimensionality (i.e., group-level factors) and moderates or supports the interpretation of ECV.According to Rodriguez et al. (2016), when ECV is > .70 and PUC > .70, the common variance can be regarded as essentially unidimensional, thus supporting the general factor.Finally, the H coefficient is a measurement of construct replicability and is defined as the extent to which a set of items represents a latent variable (Hancock & Mueller, 2001).This coefficient is calculated as the ratio of the percentage of variance explained by the latent variable to the percentage of variance unexplained by the latent variables.High H values (> .80)suggest that the latent variable is well-defined and adequately represented by the observed indicators, thus more likely to be replicated across studies.
Along with complementary model fit indices, theoretically-based reasoning should be applied when deciding in favor of a factor model other than overall model fit (Morgan et al., 2015), so that the interpretation of the resulting factors is conceptually well founded.2017) concluded that the BM comprising positive affect (PA), negative affect (NA), and a general Affective Polarity factor had a better model fit compared to alternative two-and three-factor models.They made this assertion even when relevant BM statistical indices (e.g., ECV, ω h , ω hs ) were not provided.Indeed, when BM statistical indices are calculated, conclusions differ markedly from Seib-Pfeifer et al. (2017).In particular, results show that the common variance accounted for by the general Affective Polarity factor is rather weak (ECV = 10.8 %), as it is also for the variance at item level (ECV-I mean : .07 for PA and .16for NA).The mean factor loading is moderate-to-high for both PA (λ mean = .52)and NA (λ mean = .57),and a significant amount of variance is also explained by the two specific factors (PAω hs = .78;NAω hs = .72).In contrast, both factor loadings and explained variance of the general Affective Polarity factor are close to zero (λ mean = .091;ω h = .049).In addition, the PUC value is .53and the H coeffi-cient is acceptable for group-level factors (H PA = .80;H NA = .77)and low for the general factor (H G = .41).All in all, these findings suggest that the conclusion drawn by Seib-Pfeifer et al. (2017) in their study, i.e that BM provides the most appropriate representation of the PANAS, is erroneous.

Conclusion
In a strict sense, a factor is a mathematical abstraction derived from the empirical covariance between a set of variables, which may (or may not) be interpreted as a common, substantive cause underlying a set of observable behaviors.The bifactor model is an alternative specification of the second order factor.In contrast to second-order models, the bifactor a) allows for the quantification of the direct effect of the general factor on observable variables without the need for such a relationship to be fully mediated by group factors, and b) facilitates independent evaluation of the merits of general and group factors.Underlying this model is the hypothesis that there is a general factor with causal influence on all items (Arias, Jenaro, & Ponce, 2018).Sometimes, the variance shared between different indicators may be artificial, the valence of the items (Bäckström & Björklund, 2016), or general features of those examined -not directly related to the construct-such as self-esteem (Davies, Connelly, Ones, & Birkland, 2015).These may increase the fit of the bifactor model, leading to the erroneous conclusion that there is a general, psychological, and psychometrically meaningful factor.
Bifactor models constitute a useful analytic method for the assessment of construct validity in psychological measuring.However, researchers should be wary when performing bifactor analysis and avoid using SEM model fit indices as the main or only criteria to judge the feasibility of a BM, since conclusions based on such criteria can be misleading considering that complementary statistical fit indices provide researchers with relevant information concerning the salience of general and group-level factors.Such information, together with theoretical background, will allow for the selection of the most appropriate factor model and ensure the validity of the inferences drawn from the scale scores.
) conceptual grounds, b) overall model fit indices, and c) additional, specific bifactor model indicators.Using data from a recent publication on factorial structure of the Positive and Negative Affect Schedule (Seib-Pfeifer, Pugnaghi, Beauducel, & Leue, 2017), we illustrate errors and misinterpretation that can result from exclusively relying on SEM model fit indices without taking into concertain theoretical basis and/or specific bifactor model fit indices.

Figure 1
Figure 1Graphical representations of higher-order factor model and bifactor model.