BifactorCalc: An Online Calculator for Ancillary Measures of Bifactor Models

The bifactor model allows examining the presence of a total score in a data set by modeling a general factor and two or more specific factors with an orthogonal relation -ship. These models tend to overestimate the goodness of fit (e.g., CFI, RMSEA, SRMR), hence there exist auxiliary measures that allow examining the dimensionality (ECV Gen ; ECV Specific ; I-ECV, PUC, ARPB), and reliability (ω, ω S , ω H , ω HS , PRV, H, and FD). The present study describes the operation, mathematical foundations, and application in psychological research of an online calculator called BifactorCalc . The results demonstrate that BifactorCalc is an online, us-er-friendly, and easy-to-use computer program for the calculation of the different auxiliary measures of bifactor mod els. It was concluded that the computer tool BifactorCalc is able to calculate the auxiliary measures of bifactor models in three simple steps and generate a path diagram.


Introduction
In the field of psychological evaluation, whether it be for diagnosis, intervention or research, the objective is to obtain the measurement of the construct (e.g., anxiety, depression, or stress) and the dimensions that comprise it. In this context, it is common to assume that the totality of items is influenced by the same latent variable; composed by specific factors that are integrated in a great general factor (Dominguez-Lara & Rodriguez, 2017). In that sense, the scores of the specific factors are previously summed to obtain a general score. However, it has recently been argued that to conduct this procedure, empirical evidence of the presence of a general factor must be obtained from statistical modeling (Reise, 2012).
In the context of psychology, it is common to use structural equation models (SEM), which are statistical techniques that assume the presence of a latent variable underlying the items and constitute an important alternative for evaluating psychological constructs, which are generally multidimensional (Bonifay, Lane, & Reise, 2017).
Bifactor models (see Figure 1) are within the so-called hierarchical models (Canivez, 2016;Reise, 2012), also called nested factor models (Gustafsson & Balke, 1993), and direct hierarchical models (Gignac, 2008). Its main characteristic is to evaluate the simultaneous effect of a general factor (GF), and specific factors (e.g., F1 and F2), on a set of indicators (Flores-Kanter, Dominguez-Lara, Trógolo, & Medrano, 2018). In that sense, the specific factors (SFs) are assumed to be orthogonal to each other (DeMars, 2013) because the shared variance between the specific factors is due to the general factor (Reise, 2012). Thus, the GF -in comparison to the SFs-is supposed to explain the items' greater amount of variance. Although the bifactor model was originally described in the late 1930s (Holzinger & Swineford, 1937), it has been rediscovered in the past years (Reise, 2012) and it is increasingly used in psychological research conducted in diverse cultural contexts (Anderson & Marcus, 2019;Montes & Sanchez, 2019;Vuyk & Codas, 2019). Nevertheless, the bifactor model is not immune to criticism. The evaluation of bifactor models with SEM techniques and the traditional goodness-offit indices only (e.g., CFI, RMSEA) can lead to false positives since it fails to evaluate the influence of the general factor and specific factors on the items (Bonifay et al., 2017;Dominguez-Lara & Rodriguez, 2017;Flores-Kanter et al., 2018). In fact, the evidence suggests that traditional goodness-of-fit indices may statistically favor bifactor models (Gignac, 2008;Morgan, Hodge, Wells, & Watkins, 2015).
Another important aspect is that the interchangeability of the specific factors in symmetric bifactor models (see Figure 1) is a prerequisite for its correct interpretation and the avoidance of anomalous models (see also Eid, Geiser, Koch, & Heene, 2017). The specialized literature provides some examples of the correct use of the symmetric and structurally different bifactor models for the Beck Depression Inventory-II (Heinrich, Zagorscak, Eid, & Knaevelsrud, 2018) and ADHD/ODD symptoms (Burns, Geiser, Servera, Becker, & Beauchaine, 2019).
In this context, it is necessary to have a set of auxiliary measures that allow for a better evaluation of the bifactor model. Specifically software, which is needed to calculate all these measures quickly and easily. Currently, Excel® sheets are available (Dueber, 2017) and an R package called "BifactorIndicesCalculator" (Dueber, 2020) has recently been made available. The latter requires programming skills that are still not common among psychology professionals (comparative information in Table 1). This increases the need for develop a computer program for the calculation of the auxiliary measures of bifactor models, which provides a diagram with the factorial loads entries. In this sense, the objective of this research is to develop a software called BifactorCalc that allows for the calculation of the auxiliary measures of bifactor models in an easy, friendly way.

Omega Coefficients
For a bifactor structure, four types of omega coefficients can be calculated: Total Omega (ω), Subscale Omega (ω S ), Hierarchical Omega (ω H ) and Hierarchical Omega for Subscale (ω HS ).
The omega coefficient (ω, McDonald, 2013) estimates what proportion of variance in the total observed score can be attributed to all common sources of variance (Reise, Bonifay, & Haviland, 2013). The ω is based on the factor loadings of a factorial model. Unlike other coefficients such as alpha, which is based on the assumption of equal loadings (tau-equivalent models), the omega coefficient is appropriate for cases in which the loadings of the items vary (congeneric models), an indication that is supported by several authors (Dunn, Baguley, & Brunsden, 2013;Rodriguez, Reise, & Haviland, 2015). The calculation of omega is as follows: (1) In the formula, the numerator expresses all common sources of variation of the total weighted score, and the denominator represents all common sources of total variance of the score plus the unique variance. High values of ω indicate high multidimensional composite reliability.
In the same way, the omega coefficient can be calculated for the specific factors (ω S ) from the factor loadings and errors corresponding to each set of items that comprise the subscale. The following formula is used to calculate ω S , when the variance of the general factor and the specific factors are combined to estimate reliability: (2) Both the ω and ω S coefficients reflect the systematic variation, attributed to various common factors, that affects weighted composite scores (Rodriguez, Reise, & Haviland, 2016). In this context, it is important to determine the relative weight of the different factors that determine the variance of the composite scores. To that end, some alternate indices have been developed: hierarchical omega (ω H ), and hierarchical omega for the subscale (ω HS ). Both ω H and ω HS reflect the variance attributed to a single latent variable (Rodriguez et al., 2015).
Specifically, ω H estimates the proportion of variance of the total scores that can be attributed to a single general factor and it is calculated by dividing the squared sum of the factor loadings on the general factor by the variance of the total scores (Reise, Moore, & Haviland, 2013).
The ω H is sensitive to the number of items. A greater number of items is associated with an increase in the ω H , which is also affected by the relative size of the factor load of each item in the general factor, versus the specific factors (Rodriguez et al., 2015). A high ω H (ω H > .80) would express that the scores can be considered essentially unidimensional, since the general factor is the main source of systematic variance compared to the influence of the specific factors.
The calculation of ω H can be extended to subscales through the calculation of the hierarchical omega (ω HS ), which reflects the proportion of systematic variance of a subscale score after separating the variability attributed to the general factor (Reise, Bonifay et al., 2013). Hierarchical omega is calculated from the following formula: (4) Thus, there are some cut-off points in psychology that can be used as a reference: ω HS ≥ .30 is substantial; .20 ≤ ω HS < .30 is moderate and ω HS < .20 is low (Smits et al., 2014).

Percentage of Reliable Variance (PRV)
The percentage of reliable variance (PRV) is an indicator based on the logic of the bifactor model because it considers the variance explained by the general factor (Hammer et al., 2018). This index is the ratio of ω H to ω; and therefore, it can be conceptually understood as the percentage of the total reliability that can be attributed to the reliability of the general factor (Reise, Moore, et al., 2013). See the following equation: Thus, some authors propose as a provisional cut-off point a PVR > 50, which would indicate that half of the reliable variation in test score is produced by the general factor (or the specific one, in which case ω HS is replaced by ω H in the numerator; Li, 2015).

Explained Common Variance (ECV)
The explained common variance (ECV) is an indicator of unidimensionality and expresses the proportion of the common variance that can be attributed to the general factor (Reise, Moore, et al., 2013). For its calculation, the factor loadings of the general and specific factors of a bifactor model are used on the following mathematical expression: Where: ∑ʎ 2 GEN is the sum of the squared factor loadings of the general factor; ∑ʎ 2 grpk is the sum of the squared factor loads of the specific groups. High ECV values, greater than .60, suggest that the common variance among the specific factors is small compared to the general factor; and therefore, that the data would fit an essentially unidimensional model (Reise, Scheines, Widaman, & Haviland, 2013).
For example, it has been observed that, when the ECV is greater than .60, the correlation between the general factor and a criterion variable is not substantially affected if only the general factor is modeled and not the specific factors. In other words, high ECV values indicate that it is possible to use a unidimensional model even if the data fits better with a bifactor model. Other provisional cut-off points suggested by the literature are .70 or .80 (Rodriguez et al., 2016). However, the interpretation of ECV must be done in conjunction with that of the percentage of uncontaminated correlations (PUC), which is described in the following section (Reise, Scheines, et al., 2013).

(7)
In the case of the specific factors, a variant of the formula is made by positioning the loadings of the specific factors in the numerator and the loadings of the general factor plus the specific ones in the denominator. On the other hand, it is possible to obtain an ECV for each of the items (ECV-I) with the following mathematical expression: ECV-I expresses the proportion of true variance of each item that is explained by the general factor (Stucky et al., 2013). Values greater than .85 suggest an influence of the general factor on the variance of the item .

Percentage of Uncontaminated Correlations (PUC)
The percentage of uncontaminated correlations (PUC; Reise, Scheines, et al., 2013) expresses in percent the amount of correlations that are not corrupted by multidimensionality (Rodriguez et al., 2015). In other words, it expresses what percentage of the total correlations between items occurs between items belonging to different specific factors. Therefore, the PUC together with the ECV provide information about the bias towards forcing multidimensional data into unidimensional models. Its mathematical expression is presented below: Where: I I is the number of items loaded onto the general factor; I S1 is the number of items loaded onto the specific factor 1; I S2 is the number of items loaded onto the specific factor 2; I Sn is the number of items loaded onto the specific factor n.
The interpretation of the PUC must be conducted in conjunction with the ECV. In practical terms, it has been suggested that when the PUC is greater than .80, the ECV value is not very relevant; on the other hand, when the PUC is less than .80, the ECV should be greater than .60 in order to treat the instrument as if it were unidimensional . From another perspective, it has been suggested that when ECV and PUC are both greater than .70, the scale can be treated as if it were unidimensional (Rodriguez et al., 2015).

Factor Determinacy (FD)
Often, researchers do not only model a latent variable, but also seek to estimate everyone's score on that latent variable. These individual scores are called factor scores and, in their simplest form, they correspond to the sum of the items belonging to a factor (DiStefano, Zhu, & Mîndrilǎ, 2009). There are, however, more refined methods, which are based on estimates from factor analysis.
Nevertheless, one problem with factor scores is that of the so-called indeterminacy. Although the details are technically complex, in simple terms this refers to the fact that from the same factorial solution, it is possible to obtain very dissimilar and even contradictory factor scores. In this sense, FD expresses the multiple correlations between the observed variables (items) and the factor (Grice, 2001). This value can be obtained with the following formula (Beauducel, 2011): (10) Under the conditions described in this work, this value is also equivalent to the correlation between factor scores and factors (Beauducel, 2011;Grice, 2001). For this reason, FD varies from 0 to 1, and values close to 1 indicate a better determinacy. In that case, values higher than .80 have been suggested to allow an estimate of the general factor score (Gorsuch, 1983). However, other authors argue for a higher cut-off point (> .90; Grice, 2001;Rodriguez et al., 2015)

Construct Replicability (H)
Another index that can help to better understand the quality of the measurement model is the construct replicability (Mueller & Hancock, 2008). The H index can be used to assess whether the set of items representing a latent variable is adequate. Therefore, it determines if the SEM model is adequate and replicable in all studies. The H index is calculated from the following mathematical formula: As per the formula above, H is a function of the sum of the factor loading ratios of the squared items (proportion of variance explained by the latent variable), in a factor divided by 1, minus the factor loading squared (Rodriguez et al., 2015). In this sense, as the number of items and the factor loading increase, the H index approaches 1. Values of H greater than .70 suggest that the latent variable is well defined and is more likely to be stable in other studies (Dominguez-Lara, 2016); while low values suggest a poorly defined latent variable, which changes in other studies. The ease of calculating and interpreting the H index, makes it an ideal means of judging the viability of a measurement model based on a set of items.

Average Relative Parameter Bias (ARPB)
The ARPB is a measure for examining the difference between the factor loading of a unidimensional model and the general factor loading of the bifactor model (see equation 12). According to some authors, a maximum difference of .12 to .15 may be acceptable (Rodriguez et al., 2015). (12)

Average Factor Loading (mean)
A first approach to the bifactor model consists in the simple inspection of its factor loadings (Reise, Moore, & Haviland, 2010). If a scale has a strong general factor and a weak set of specific factors, then the factor loadings of the latter will be notoriously low, while the general factor loadings will tend to be higher. A simple way to examine this is by calculating the arithmetic mean of the items. Following other authors, means lower than .30 in the specific factors can be considered secondary evidence of unidimensionality (Ferrando & Lorenzo-Seva, 2017).

Software development Description of BifactorCalc
The BifactorCalc calculator was developed with Python programming language, and all the previously presented formulas were entered. For this purpose, the summation and matrix multiplication calculations were produced with the Numpy library (Harris et al., 2020). The Django framework was used to deploy the web project and build a user-friendly interface in an online version without the need to install the software on a computer (Django Software Foundation, 2019). The styles of the online interface were made with the Bootstrap web style framework, which provided the visual characteristics of the buttons, colors, and frames; tables in APA format, the distribution of the content on the screen, and all the other components displayed on the interface. Finally, the graphic construction of the BifactorCalc and the integration with the calculations were performed with JavaScript.
For use, BifactorCalc will require the user and password (Figure 2), which can be requested to the authors of the article via email to store their bifactor models privately. Link to calculator: https://joseventuraleon.com/f/bifactorcalc In the main menu there are two options ( Figure 2): New Bifactor Model, to generate new models and Logout to exit the application. In the My Models section, the models entered by the user will be displayed, identified by the name of the general factor assigned.
In the New Bifactor Model option you can enter the factor loads of the model following the instructions provided in steps 1, 2 and 3. The procedure must be followed so that the calculator receives the data correctly and the information can be calculated satisfactorily. In Step 1, you must enter the name of the General Factor, the items, and the general factor loadings. In addition, BifactorCalc allows for the entry of factor loadings from a unidimensional model to calculate the ARPB, which measures the difference in the general factor loadings of the bifactor and unidimensional models. In Step 2, you must enter the names of the specific factors of the bifactor model. A maximum of two decimals should be used to enter the loads. Finally, it is necessary to click on Step 3, to continue entering the information. new window, as shown in Figure 3.
Besides, BifactorCalc provides a diagram (see Figure 4) that can be copied and used in the scientific manuscript of the BifactorCalc user.

Validation of BifactorCalc
To demonstrate the operation of the calculator, information from Yap et al. (2014), was used as a sample, and the similarity of the results obtained in the BifactorCalc with the calculations of Rodriguez et al. (2016) was corroborated. Firstly, the factor loadings of the bifactor model provided by Yap et al. (2014) for its Ethnic Identity Scale (EIS) were entered in addition, unidimensional loadings were estimated from the inter-item correlation matrix with the R program (R Core Team, 2020). Secondly, the respective names were assigned to the specific and general factors (see Figure 3 on the left side). Thirdly, the factor loadings of the specific factors were entered (see Figure 3 on the right side).
As for the validation of BifactorCalc, the example explaining the "BifactorIndicesCalculator" package was run in R (Deber, 2020) and in BifactorCalc. The compared results similarity validate the correct functioning of the software.

Reporting BifactorCalc Results
In relation to the report of a bifactor model, this can be divided into two main moments: (a) Dimensionality, which consists of using the indexes ECV Gen ; ECV Specific ; I-ECV, PUC and ARPB -to determine if the model is unidimensional or multidimensional-; and (b) Reliability, which consists of using the indexes ω, ω S , ω H , ω HS , PRV, H and general and specific FD.
Using the information in the example, the Fourth, pressing the Finish Bifactor Model button automatically performs the calculation of the auxiliary measurements, which appear on a  following can be reported: In relation to the dimensionality of the Ethnic Identity Scale, it was observed that it presents an ECV Gen .75, which suggests that the general factor explains 75% of the variance of the items, which could suggest a tendency towards unidimensionality (ECV > .60). In addition, the ECV Specific1 and ECV Specific2 presented a value of .26 and .24 respectively, which would indicate that the specific factor explains 26% and 24% of the common variance, respectively. In relation to the I-ECV it was observed that only items 3, 11, 2 and 10 are strongly influenced by the general factor (I-ECV > .85). The PUC was equal to .53. Therefore, 53% of the correlations are &quot;contaminated&quot; by the multidimensionality, leaving 47% of the correlations to be explained by the general factor alone. Finally, the ARPB is equal to .07, which indicates that the general factor loads of the bifactor model and factor loads of the unidimensional model are different only by 7%, being within the acceptable ranges.
In relation to the reliability of the IEE, it presented a ω of .93 and ω S were .93 and .80 for the specific factor 1 and 2 respectively. All these values reveal an excellent composite reliability [the expressions suggested by Cicchetti (1994) used for Cronbach's alpha are extrapolated]. With respect to the ω H it is equal to .81 expressing that the general factor is the main source of variance in comparison with the specific factors. In this regard, ω HS is .22, which can be considered a moderate consistency of factor 1; and .16 a low consistency of factor 2 (Smits et al., 2014). The PRV would indicate that 87% of the reliable variance is due to the general factor and only 24% and 19% of the reliable variance to the specific factors. The H coefficient is equal to .92 in the general factor, which implies stability in other studies; while the specific Hs are less than .70, providing evidence in favor of the general factor. Finally, the FD for the general factor and the two specific factors are: .94, .79 and .69 respectively, indicating that only the general factor score should be used for the analysis.

Conclusions
This work was aimed at designing a user-friendly, online calculator for the auxiliary measures of the bifactor model. Understanding that multidimensional models are increasingly common (Montes & Sanchez, 2019;Vuyk & Codas, 2019), and the presence of a general factor should be verified empirically (Dominguez-Lara & Rodriguez, 2017;Flores-Kanter et al., 2018). Since the assumption that a high correlation between factors indicates the presence of a total score is no longer sufficient, it is necessary to examine this structure with a bifactor model (Anderson & Marcus, 2019). Some authors state that the bifactor model's goodness-of-fit tends to be positively biased (Bonifay et al., 2017;Gignac, 2008;Morgan et al., 2015) and thus, it is necessary to explore auxiliary measures Rodriguez et al., 2016). Despite this, there is no software for the estimation of these measures in a quick and simple way (only in three steps). The most similar option is an R-package (Dueber, 2020), which requires programming skills and the installation of an Office program.
In that sense, BifactorCalc is an online software that through a user and password enables the storing of Bifactor models, the modification of factor loadings in case of errors, and the estimation of all auxiliary measures in only three steps. In relation to its validity, information from Yap et al. (2014), and estimates made by Rodriguez et al. (2016), were used to verify that BifactorCalc, which reported the same results. This same procedure was performed with R-package BifactorIndicesCalculator (Dueber, 2020). Thus, BifactorCalc operation proved to be optimal.
In addition, this research provides an example of how the results obtained with BifactorCalc can be reported; framed in two major moments, the review of the dimensionality and the reliability. In this way, the users of this software will be able to easily incorporate the results in their scientific manuscript.
Finally, the software is expected to contribute to the scientific community in the field of psychology and to promote methodological best practices associated with the implementation of the bifactor models in the Spanish-speaking context.