The impact of the inappropriate modeling of cross-classified data structures

Meyers, Jason Leon
Journal Title
Journal ISSN
Volume Title

Hierarchical linear modeling (HLM) is typically used in the social sciences to model data from clustered settings, such as students nested within classrooms. However, not all multilevel data are purely hierarchical in nature. For example, students can be nested within the neighborhoods in which they live and within the schools they attend. But, most likely, students from a given neighborhood do not all attend the same school and students from a given school do not all reside within the same neighborhood. Because neighborhoods are not nested within schools nor vice-versa, the two are said to be cross-classified. Cross-classified random effects modeling (CCREM) is used to model data from these non-hierarchical contexts. While use of CCREM has increased in various disciplines such as medicine, it is seldom used in educational research. CCREM is mentioned in most multilevel modeling textbooks (for example, Raudenbush & Bryk, 2002; Hox, 2002; Snijders & Boskers, 1999). However, it remains infrequently used, most likely because the models are technically sophisticated and can be somewhat difficult to interpret. Little research has been conducted assessing when it is necessary to use CCREM, so this dissertation involved several studies. A Monte Carlo Simulation Study was conducted in order to investigate potential factors affecting the need to use CCREM as well as the impact of ignoring cross-classification. As a follow-up study, CCREM was applied to a large-scale national data set in order to provide insight into the potential effects of ignoring the cross-classified data structure. Results of both studies indicated that when using HLM instead of CCREM, the fixed effect estimates were unaffected but the standard error estimates associated with the variables modeled incorrectly were biased. In addition, the estimates of the variance components displayed bias. The observed bias was related to the proportion of the total variance that was between each cross-classified factor, the sample size, and the similarity of the cross-classified factors. Implications and limitations are discussed and suggestions for future research are presented.