Measuring dysfunctional parenting: Psychometrics of three versions of the Parenting Scale

Objective: This study assessed the psychometric properties of three versions of the Parenting Scale (PS; original PS, 13-item version, and 10-item version) in three European middle-income countries. Background: The PS is one of the most frequently used questionnaires for measuring dysfunctional discipline strategies. Although its validity has been extensively investigated in American samples, there are mixed results regarding the recommended number of items and subscales, raising the question of replicability across European middle-income countries. Method: Multigroup confirmatory factor analysis (MCFA) and item response theory (IRT) were applied to N = 835 parents from North Macedonia, Moldova, and Romania. Results: All three versions were significantly correlated with parental-and child-related variables. Confirmatory factor analysis indicated the best model fit for the 10-item version, and configural and


THE PARENTING SCALE
The Parenting Scale (PS) was developed to assess dysfunctional parental discipline strategies (Arnold et al., 1993) and is one of the most frequently employed questionnaires in clinical and research practice on parenting (Pritchett et al., 2011). As previous research showed, several studies have utilized the PS to assess dysfunctional parenting as a main outcome of their intervention studies, given its sensitivity to change (De Graaf et al., 2008;Nowak & Heinrichs, 2008). As a 30-item self-report measure, it consists of three subscales: Laxness (highly permissive and inconsistent parenting behavior), Overreactivity (harsh, impulsive, and aggressive parenting), and Verbosity (repeated talking instead of taking action). Participants are presented with different situations (e.g., "When my child misbehaves …") and asked to indicate their parenting behavior on a Likert Scale ranging from 1 to 7 (e.g., 1 = I do something right away … to 7 = I do something later), with higher scores reflecting higher levels of dysfunctional parenting.
In the last 2 decades, numerous studies have tried to replicate the original factor structure (Arnold et al., 1993), with relatively consistent empirical support for the use of Laxness and Overreactivity, but not Verbosity (Pritchett et al., 2011;Rhoades & O'Leary, 2007;Salari et al., 2012;Steele et al., 2005). Nevertheless, suggestions as to which and how many items best reflect the two remaining factors are varied (Irvine et al., 1999;Lorber et al., 2014;Reitman et al., 2001;Rhoades & O'Leary, 2007) and have initiated the development of several shorter versions. Reitman et al. (2001), for instance, did not find empirical support for the original three-factor structure and instead suggested the use of a modified Laxness and Overreactivity scale, each consisting of five items. In a later study, Rhoades and O'Leary (2007) successfully replicated the version recommended by Reitman et al. (2001) but proposed a 13-item measure with parental hostility as a new factor (this version of the PS is referred to as the "13-item version" in the paper). The Hostility scale was composed of items originally assigned to Overreactivity but which appeared to be more indicative of harsh parenting (i.e., physical punishment, verbal aggression). Another short version, which was recently discussed, is the one by Lorber et al. (2014; referred to as the "10-item version" in this paper), who investigated which items of the PS best discriminated parents with more serious parenting problems. This shorter form consisted of five lax and overreactive items each and included those items providing most information along the continuum of functional to dysfunctional discipline behaviors in mothers and fathers. Parenting Scale items for each version were very similar, with one Laxness item being different for the 10-item version for men compared to women (i.e., PS7 "Threaten things I know I won't do" for mothers and PS21 "Offer something nice to behave if 'no' fails" for fathers).

AIM OF THE CURRENT STUDY
Although the validity of the PS has been extensively investigated in high-income countries (HICs), with the majority of studies coming from the United States (Freeman & DeCourcey, 2007;Lorber et al., 2014;Reitman et al., 2001;Rhoades & O'Leary, 2007;Steele et al., 2005), information on its performance in European middle-income countries (MICs) is currently limited. Although some studies validated translated versions of the PS in their respective countries, such as Germany, Australia, and Sweden (Kliem et al., 2019;Naumann et al., 2010;Prinzie et al., 2007;Salari et al., 2012), none of these studies were conducted in MICs nor in former Communist-bloc countries, raising the question about the replicability of this measure in low-resource settings. As past research suggests, participants may benefit more from culturally adapted health interventions than those that have not undergone this process (Griner & Smith, 2006;Sundell et al., 2016). For example, only recently, McWayne et al. (2017) suggested that positive parenting among low-income Black families may be different from positive parenting in other cultural contexts. To account for this, the authors developed the Black Parenting Strengths in Context measure to better honor the cultural diversity and contextspecificity of low resource settings. This approach, however, is very cost intensive and limited in its applicability, given the large number of different cultures. Another, more economical way of investigating and acknowledging this aspect of measurement is to focus on instruments that have been widely accepted and to specifically test their psychometric properties across heterogeneous samples.
In line with this, the current study aims to address this research gap by utilizing an exploratory approach to examine the psychometrics of the PS in three middle-income European countries-North Macedonia, Republic of Moldova, and Romania. Following data from The World Bank (n.d.), 22%-25% of the population in these countries are living below the national poverty line and are affected by economic hardship. Furthermore, only a small number of support programs are available for families in these countries, while at the same time showing elevated levels of family difficulties and violence (Levav et al., 2004;UNICEF, 2014). For instance, a study by Lansford et al. (2020) conducted in 21 LMICs, including Moldova, showed that 30% to 40% of participants found violence toward the wife to be an adequate reaction in 1 out of 5 situations, and 55% of participants further reported that they had used physical violence as a means of disciplining their child in the last 30 days, highlighting the importance of taking into account different beliefs and attitudes present in different cultures. Drawing on open questions, such as the applicability of short versions of the PS and the need for psychometric investigations of these versions in European MICs, the following research questions were examined in the current study: (1) How well does the original PS perform compared to two short forms (10-item version and 13-item version) in terms of validity and reliability in a sample of N = 835 caregivers from three MICs? (2) Do the three versions of the PS perform equally well across North Macedonia, Republic of Moldova, and Romania with respect to validity and reliability?

METHOD Participants and procedure
Data collection took place in North Macedonia, the Republic of Moldova, and Romania during March and April 2019 as part of a three-phase study (with the present data coming from the baseline assessment of Phase 2 [Lachman et al., 2019]). Potential participants were invited to take part in a parenting program for parents of children aged 2 to 9 years (Parenting for Lifelong Health for Young Children; PLH-YC) and were contacted by research coordinators via phone, letter, or in person. Participants were screened for eligibility and informed that they had the right to decline participation and/or withdraw from the program and/or the research evaluation at any time. Caregivers, aged 18 years or older, were eligible to participate if they had lived in the same household as their child for at least four nights a week in the previous month and reported that the child on whom they chose to focus on for the program showed elevated levels of child behavior problems based on scores of 10 or above on the Child and Adolescent Behavior Inventory oppositional defiant disorder subscale (eight items; Burns et al., 2015).
Data assessors were local research assistants with intensive training in ethics, informed consent, and interviewing techniques. Questionnaires were completed by each caregiver, either using an electronic tablet or a paper and pencil interview-assisted response sheet. For their participation, all caregivers received a food/gift voucher for data completion at each assessment point. The study was approved by the Human Research Ethics Committee of the University of Klagenfurt and the Human Research Ethics Commission of the local institution of the respective country. Overall, 835 caregivers participated in the parenting program (96.0% female caregivers). The majority of caregivers reported being the biological parent of the child (92.1% biological mother, 3.8% biological father, 2.6% grandmother/ grandfather, 1.3% other). Demographic information and sample characteristics can be found in Table 1.

Measures
Besides demographics, participants completed questionnaires on parenting behaviors, parental stress, and children's externalizing and internalizing symptoms.  The PS (Arnold et al., 1993) Laxness (11 items) and Overreactivity (10 items) subscales were used to measure dysfunctional parental discipline strategies. Participants were asked to rate their parenting behavior when encountering child misbehavior on a scale ranging from 1 to 7, with higher scores indicating more dysfunctional parenting strategies. For the purposes of this study, the original PS version (21 items) was administered, and the scores for the 13-item version and 10-item version were extracted from the data. Cronbach's α for the three versions can be found in the Results section.

Parenting of Young Children (PARYC)
Positive parenting was assessed using the PARYC (McEachern et al., 2012), a 21-item selfreport measure in which parents were asked to indicate the amount of positive parenting behaviors on a scale from 1 = never to 7 = always (e.g., "Spend time with your child in ways that were fun for both of you?"). The PARYC includes three subscales: Positive Parenting, Monitoring, and Planning. Higher scores reflect more positive parenting. For the purposes of this study, only the total score was utilized (α = .89 for the total sample).

Parental Stress Scale (PSS)
Parental stress was measured utilizing the PSS (Berry & Jones, 1995). This self-report questionnaire comprises 18 items, which are rated on a Likert scale, ranging from 1 = strongly disagree to 5 = strongly agree (e.g., "I am happy in my role as a parent"). In addition to a total score, with higher values reflecting higher levels of parental stress, four subscales can be computed: Rewards, Stressors, Loss of control, and Satisfaction. In the current study, only the PSS total score was used (α = .80 for the total sample).

Child behavioral problems
Children's internalizing and externalizing behaviors were measured using the Child Behavior Checklist (CBCL) versions for younger (CBCL/1½-5 version) and older (CBCL/6-18 version) children (Achenbach, 1991;Achenbach & Rescorla, 2000). Parents were presented with 103 (younger children) or 113 items (older children) and asked to rate the frequency of certain child behaviors on a Likert scale (e.g., "Destroys things belonging to his/her family or other children"; 0 = not true; 1 = somewhat or sometimes true; 2 = very true or often true). In addition to a total score, scores for internalizing and externalizing were calculated. For the purposes of this study, a CBCL combined version (aggregated across older and younger children) was used (CBCL young: α = .94, CBCL old: α = .95 for the total sample). All questionnaires were translated into the local language (Macedonian, Moldovan, and Romanian) and translated back into English for verification of accuracy. The CBCL translations went through a translation process with the developers of the CBCL and bilingual child mental health experts in the local languages.

Analytical strategy
Overall, there were very low rates of missing data in this sample (<1%). To account for the non-normality of variables, maximum likelihood estimation with robust standard errors (MLR) was used as estimator choice, and full information maximum likelihood (FIML) was utilized to account for missing data (Enders & Bandalos, 2001). The following statistical analyses were conducted to address the above-mentioned research questions: First, confirmatory factor analysis (CFA) was applied to investigate the validity of the different versions of the PS and obtain information on the performance of the PS in a middle-income European sample. Second, measurement invariance and chi-square difference testing (mean-adjusted Satorra-Bentlerscaled χ 2 ) were conducted to assess whether the factor structure and loadings of the PS versions were the same among the three countries. This approach seemed relevant because the equality of the psychometrics of the PS across three countries must be tested explicitly to acknowledge potential differences in the subsamples that might be overlooked when only analyzing the total sample. Furthermore, each country used a different translation of the measure in accordance with the main language in the respective country, highlighting the need for additional country-level analysis. Third, the generalized partial credit model (GPCM) was applied to obtain information on how well the PS items discriminate among parents with different levels of lax and overreactive parenting, as well as to identify potentially problematic items. This model was utilized to complement the CFA approach and enable a more nuanced item-level analysis of the three versions of the PS overall and on the country level. Furthermore, Pearson correlations were calculated to investigate convergent and divergent validity for the different PS versions, with respect to measures that have been theoretically linked to dysfunctional parenting behaviors, such as parental stress, positive parenting behaviors, and child mental health problems. To examine the reliability of the original PS, 13-item version, and 10-item version, Cronbach's α, including a 95% confidence interval (CI), was computed for the total sample and for each country. Statistical analyses were conducted using Mplus (Version 8.4; Muthén & Muthén, 2019). The following fit indices were applied to evaluate the overall model fit: The standardized rootmean-square residual (SRMR), the root-mean-square error of approximation (RMSEA), the Tucker-Lewis index (TLI), and the comparative fit index (CFI). For SRMR and RMSEA, values <.05 are considered a good model fit. For the CFI and TLI, values >.90 indicate acceptable, and > .95 imply good model fit (Hu & Bentler, 1999).

Intercorrelations
The total scores of the three versions were significantly intercorrelated. The Laxness scale of the 13-item version was not significantly correlated with its corresponding Overreactivity and Harshness scales, the original PS Overreactivity scale, and the Overreactivity scale of the 10-item version (see Table 2). All three versions were significantly associated with measures of positive parenting behaviors, parental stress, and child behavioral problems (see Table 2). However, correlations for the subscales differed considerably, with the Laxness subscale of the 13-item version displaying the lowest convergent and divergent correlations.

Item level analysis
Descriptive statistics of the items and item wordings can be found in Table 3. F tests were used to compare item means across the three countries accounting for family-wise error inflation. Item discrimination parameters and item location parameters from the GPCM for the total samples as well as separately for the three countries are presented in Table 4. For the interpretation of the item discrimination parameters, the following suggestions for cut-off values were used (Baker, 2001): negative values (indicative of problematic items and reverse response patterns), 0 (no discrimination), 0.01-0.34 (very low discrimination), 0.35-0.64 (low discrimination), 0.65-1.34 (moderate discrimination), 1.35-1.69 (high discrimination), and >1.70 (very high discrimination). Overall, item discrimination for the total sample and for the three countries ranged from low to moderate discrimination, except for PS12, which displayed a negative value for Moldova and Romania (α j = À0.049, α j = À0.039).
According to Baker (2001) and de Ayala (2008), values for item location parameters typically lie between a range of À3.00 to 3.00 with item location parameters less than À2.00 being classified as (very) easy, values around 0.00 as average, and values greater than 2.00 classified as (very) difficult, depending on the type of scale and context of assessment. Parenting Scale items ranged from easy to (very) difficult, with notable differences for PS12 across countries, as the item location parameters for the total sample and North Macedonia can be classified as relatively difficult (b j = 2.556, b j = 1.472), while for Moldova and Romania the parameters can be classified as average to easy (b j = À0.190; b j = À0.652). For a more nuanced item-level analysis, item information curves (IIC) were plotted for Laxness and Overreactivity items, which represent a helpful tool for identifying the amount of information each item provides along the continuum of functional to dysfunctional parenting (see Figure 1 and Figure 2). Figure 1 indicates that for Laxness, PS30 provided most information, particularly from the mid-to higher range of Laxness. Items PS19 and PS20 provided most information in the midrange of Laxness, and overall, PS12 and PS7 provided least information along the whole continuum of lax parenting behaviors. Figure 2 displays the respective IICs for Overreactivity and shows that PS28 provides most information at the higher level and PS10 at the lower to mid-level of Overreactivity. Item PS9 was least informative along the continuum of low to high levels of overreactive parenting.

13-item version
Overall, this version did not fit the data well (total sample: CFI = .781; TLI = .725; RMSEA = .056). However, fit information varied considerably across the three countries, with North Macedonia showing the best fit indices (CFI = .929; TLI = .911; RMSEA = .037), and Moldova (CFI = .741; TLI = .675; RMSEA = .063) and Romania (CFI = .736; TLI = .668; RMSEA = .075) displaying unsatisfactory model fit. For the total sample, loadings ranged from .066 to .670 for Laxness and from .313 to .644 for Overreactivity. Loadings for Harshness ranged from .474 to .532 for the total sample (see Table 6 for more results). Confirmatory factor analysis results on the country level indicated a negative factor loading of À.187 for Moldova and À .587 for Romania; however, convergence problems emerged when trying to fit the three-factor version to the Romanian sample. After checking for inter-item correlations of the respective Laxness and Overreactivity scales, item PS12 was identified as problematic, indicating significant negative correlations with items from the Overreactivity scale (ranging from À.082 to À.535) and nonsignificant correlations with items from the same scale (ranging from À.077 to À.109).
T A B L E 4 Item discrimination parameters for the total sample and three different countries for lax and overreactive parenting

Factorial invariance across countries
Tests for measurement invariance across countries were only conducted for the 10-item version, given the unsatisfactory model fit for the original PS Laxness and Overreactivity scales and the 13-item version. Results showed that configural invariance was supported for the 10-item F I G U R E 1 Item information curves (IIC) of the Laxness Scale Note. Analyses were run across all Laxness items utilizing the total sample of N = 835. For wording of the respective items, see Table 3.
T A B L E 6 CFA loadings and internal consistencies for the 10-item version and 13-item version

10-item version
Internal consistency for the 10-item version was as follows:

DISCUSSION
The PS is one of the most commonly used questionnaires for assessing dysfunctional parenting practices in young children (Pritchett et al., 2011). Due to its sensitivity to intervention impact, it has been frequently employed in clinical as well as research practice (Sanders, 2008;Thomas & Zimmer-Gembeck, 2007). Although the psychometrics of the PS have been extensively investigated in HICs, such as the United States, the literature on its performance in European MICs is sparse. Furthermore, the majority of past studies focused on exploratory factor analysis (EFA) and/or CFA approaches and have mostly analyzed differences in the PS's performance across child and parental age, child gender, parental gender, or parental education levels (Karazsia et al., 2008;Kliem et al., 2019;Steele et al., 2005). However, to date, it has not yet been examined how well different versions of the PS perform across European MICs. This paper, therefore, aimed to address this research gap by investigating the psychometrics of the PS across North Macedonia, Moldova, and Romania. Overall, the 10-item version showed the best model fit for the total sample as well as for the three countries separately. While configural invariance was supported for this measure, metric invariance could not be established, given that strengths of the factor loadings varied across the three countries. However, partial metric invariance could be established after removing two constraints for PS7 and PS20 for Moldova versus Romania and PS7 for North Macedonia versus Moldova. Furthermore, IICs indicated that items from the 10-item measure-such as PS30, PS19, and PS20 for Laxness and PS28, PS22, and PS10 for Overreactivity-were among the most informative items from the PS in the current sample. The poorer performance of the original measure, as well as the 13-item version compared to the 10-item short form, could be attributed to problematic items, such as PS12, which were not included in the 10-item version and thus could have reduced misfit in the CFA model. However, although PS12 worked for North Macedonia in the original PS and the 13-item version, measurement-related issues became apparent in Moldova and Romania, displaying problematic factor loadings and negative item discrimination parameters in these two countries. Further country-level analysis indicated that compared to Moldova (CBCL Extern.: 45%; CBCL Intern.: 53%) and Romania (CBCL Extern.: 37%; CBCL Intern.: 28%), parents in North Macedonia reported lower levels of child behavioral and emotional problems (CBCL Extern.: 18%; CBCL Intern.: 19%; calculated on t values ≥60), with values comparable to the validation samples of the original PS and the 13-item version (Arnold et al., 1993;Rhoades & O'Leary, 2007). These results point to potentially worse applicability of the 13-item version to countries with elevated levels of child behavioral and emotional problems. Additionally, Laxness IICs indicated that PS12 provided very little information along the continuum from low to high levels of dysfunctional parenting. These results suggest problems related to the wording and content of this item, pointing to the difficulty of adapting it to a different cultural context than that for which it was originally developed. Indeed, consultation with the Moldovan and Romanian research teams indicated that while the item content might be equally captured in both languages, problems could be related to a potentially less discriminative response format of PS12 in Moldovan and Romanian compared to the original English wording ("I firmly tell my child to stop … I coax or beg my child to stop"). This is also in line with research suggesting that concepts of parental discipline strategies may be perceived differently across cultures and social backgrounds (Bozicevic et al., 2016;Chao, 2000;McWayne et al., 2017), highlighting the relevance of costeffective cultural adaptation processes of parenting measures, such as the PS.
Taken together, all three versions investigated in this paper were significantly associated with measures of positive parenting, parental stress, and child behavioral and emotional problems, which is in line with results from studies evaluating the validity of the PS in other European or American samples. For instance, Rhoades and O'Leary (2007) investigated the psychometrics of the 13-item version in 453 couples from New York and found that Overreactivity, Laxness, and Hostility were significantly associated with child externalizing (r = .33, r = .15, and r = .27, respectively), and impulsive behaviors (r = .26, r = .14, r = .21) at the p < .01 or p < .001 level. In the study by Lorber et al. (2014), the authors analyzed two American subsamples (N = 453, and N = 399), showing that the original PS, the 13-item version, and the 10-item version were significantly related to child externalizing behaviors for both fathers and mothers at the p < .05 level (r mothers = .47, r fathers = .39; r mothers = .42, r fathers = .32. r mothers = .45, r fathers = .34, respectively). Salari et al. (2012) recruited 617 mothers and 430 fathers from Sweden and also found significant associations for the original PS version and the 13-item version with child emotional (r = .16, r = .16), and behavioral problems (r = .45, r = .42), as well as parental stress (r = .29, r = .25) at the p < .05 level. Results of the current study, therefore, align with findings from parents in the United States and other European samples, indicating that associations of the three versions of the PS with measures on parental stress and child emotional and behavioral problems are somewhat replicable across different countries and cultures. While not all studies, including the present one, utilized the same measure to assess convergent and divergent validity (for example, Rhoades & O'Leary [2007] used the Eyberg Child Behavior Inventory, while Lorber et al. [2014] applied the McArthur Health and Behavior Questionnaire, and Salari et al. [2012] utilized the Strengths and Difficulty Questionnaire as well as the Eyberg Child Behavior Inventory), these findings provide preliminary evidence of convergent and divergent validity of the PS as a measure for assessing dysfunctional parenting behaviors in various samples.
The psychometric results of the current study were most supportive of the 10-item version, given the good model fit overall and on the country level, results of the IICs, and configural and partial metric invariance across North Macedonia, Moldova, and Romania. Although reliability of the 10-item version was lower than what was reported in the original study, reliability indices of this version were similar to the corresponding long version in the current sample (10-item version: α = .638; original PS: α = .663). As Lorber and Slep (2018) showed, internal consistency is not always a solid indicator of the measurement precision of a questionnaire. Although the authors received low values of internal consistency for the Parent-Child Conflict Tactics Scale (CTS-PC) in their study, further IRT analysis showed that these items provided important evidence for differentiating parents with serious parenting problems from those without. Following these results, lower reliability coefficients in the current sample do not necessarily indicate that the PS is an unprecise measure for assessing dysfunctional parenting in the current sample. Rather, low internal consistency could be attributed to heterogeneity in discipline behavior: For instance, some parents might report holding a grudge after child misbehavior but might not yell at their children or call them mean names when encountering a difficult situation. Such heterogeneity in parenting strategies can potentially result in lower correlations between items assigned to the same subscale, which also influences reliability indices of this measure.

Implications for research and practice
Overall, researchers and practitioners are advised to use the 10-item version to evaluate dysfunctional parenting behaviors in European samples from MICs, given its more economical applicability (10 instead of 21 items) and better psychometric performance compared to the original PS and the 13-item version. Nevertheless, future investigations of the 10-item version might be beneficial to further improve the psychometrics of this measure. For example, PS7 displayed low factor loadings and item discrimination parameters in this sample and was identified as non-invariant in multigroup analyses, indicating room for improvement, such as the revision or replacement of this item. Moreover, we recommend future revisions of the PS not solely be based on psychometric analyses but also include theoretical considerations on dysfunctional parenting. In particular, the importance of cultural aspects and different parenting norms, as well as concepts, needs to be taken more into consideration to ensure a valid assessment of dysfunctional parenting across various samples.

Strengths and limitations
Recent calls for the implementation of evidence-based and cost-effective interventions in LMICs have also raised awareness of the need to investigate the cross-cultural and crosscountry adaptability of instruments used in prevention and intervention programs. This paper is the first to evaluate the psychometric properties of three versions of the PS in a sample of three European MICs, providing researchers and clinicians with recommendations on the future use of this questionnaire in low-resource settings. Although the participation rate was relatively high (83.1%), the majority of parents taking part in the program were female, and therefore, no conclusions can be made about the performance of the different PS versions among male caregivers. Recruitment of fathers for family research has remained a challenging topic (Fabiano, 2007;Phares et al., 2005) and certainly requires more attention in future research. One significant strength of this study, however, lies in the sample, offering insight into the applicability and performance of the PS in three under-researched European MICs. Future research, however, is needed to investigate the generalizability of the current results to other European low-resource MICs. As expected, means for dysfunctional parenting behaviors reported in the current sample (3.08, SD = 0.72) were somewhat higher compared to American studies (2.68, SD = 0.85); Lorber et al., 2014), underlining the importance of further research and interventions efforts in these countries, as well as the need to address measurement issues and prevention efforts for child and parental mental health simultaneously.