Implementation fidelity of universal design for learning and effects on student achievement engagement and belonging

This study employed a comprehensive mixed-methods longitudinal research design to investigate the implementation of the Universal Design for Learning (UDL) framework across diverse educational contexts and examine its impact on student achievement and engagement. The methodological approach integrates quantitative measurement of implementation fidelity and student outcomes with qualitative exploration of implementation processes and contextual factors.

Table of Contents

Research design and theoretical framework

The study employed a convergent parallel mixed-methods design, combining both quantitative and qualitative data collection methods. Both types were collected simultaneously. UDL principles by¹, the theory of implementation science, and the theory of ecological systems. This indicates that changes in education occur at multiple levels, and the implementation of UDL involves a complex web of relationships that encompass personal, school, and policy levels.

Figure 1 outlines the conceptual model of the study. It shows how the three main ideas shape the UDL use steps. UDL’s basic thoughts on entering, sharing, and acting. UDL ideas, science, and systems thinking join to create this model. The main point is to check how good UDL tools are. This method enables us to observe the practicality of ideas when they are put into practice. The key UDL points on joining in, showing stuff, and doing things help us teach all.

Implementation science is about applying new teaching methods to change how people behave. The key aspects of this are staying true to the plan, making adjustments as needed, and maintaining momentum over time. These help us understand why it is key. The ecological systems view provides a framework for understanding how these changes occur on multiple levels. This view recognizes that changes occur in small areas, such as classrooms, in larger areas, like schools, and even in broad areas, like government regulations.

As illustrated in Fig. 1, the integration model highlights the mediating role of implementation fidelity in the relationship between the core UDL principles and student outcomes. Achieving meaningful application of these principles involves more than simply adopting UDL techniques; it demands critical reflection on how these principles are enacted within specific contextual constraints and opportunities.

The lower portion of Fig. 1 highlights how educational systems operate across multiple nested levels, including individual, institutional, and broader policy contexts. These layers are interdependent, and the challenges at each level have a mutual influence. Effectively distinguishing between these interconnected factors is vital for informing both policy and practical implementation efforts.

This multi-level, ecological perspective is fundamental for understanding why UDL implementation outcomes vary across different settings. It also forms the basis for designing comprehensive strategies that address systemic barriers and leverage facilitators at every level. According to this framework, successful UDL implementation hinges on coordinated alignment across system layers, with fidelity of implementation serving as the key link between adoption and positive educational outcomes.

Notably, the framework acknowledges that implementation quality will vary by context, and that understanding this variability is crucial for developing scalable and sustainable implementation models. From this ecological viewpoint, factors such as institutional culture, resource availability, faculty preparedness, policy alignment, and administrative support all interact to shape the success of UDL initiatives.

Participants and settings

A selected method was used to pick people from a broad mix of possible choices. The study involved 2,473 students and 342 teachers from 87 institutions spanning K–12 (elementary through high school) and a small subset of postsecondary sites (community colleges and teacher education programs). These schools demonstrated a wide range of UDL use, from early adopters to those far ahead, illustrating the diverse ways it is applied. To demonstrate different ways UDL is put into use, people came from city, town, and rural areas from various places.

Student participants had different traits, such as:

Faculty participants were:

New interns
Entry-level workers
Experienced people
Trainers skilled in UDL

The places they came from also differed in:

Size
Available resources
Admin help for inclusion
Tech setup

This wide and well-planned mix allowed us to examine how well UDL worked in various types of school settings.

Participant selection within institutions

We employed a purposive, criterion-based sampling strategy to select participants within each institution. Instructor selection prioritized variability in (a) UDL exposure (none/introductory/comprehensive), (b) teaching experience (novice/intermediate/expert), and (c) disciplinary area, as verified by institutional liaisons. Instructors were invited via email to participate in information sessions, and their participation proceeded upon informed consent. Classroom selection targeted courses taught by participating instructors during the study waves; where multiple sections existed, we selected sections to maximize schedule and demographic diversity. Student participation was inclusive of all enrolled learners in selected classes, with opt-out procedures in place and parental/guardian consent required for minors. To mitigate selection bias within institutions, we monitored uptake by department and experience level, and when imbalances emerged, we sought additional instructors to restore heterogeneity. This protocol ensured transparency in identifying individual participants within the broader institutional sample.

Research sites and implementation background

The 87 participating institutions spanned urban, suburban, and rural contexts across multiple administrative regions within the study jurisdiction. In line with IRB and institutional confidentiality requirements, we report regional rather than institution-identifying details. Sites included K–12 schools (elementary through secondary) and a subset of postsecondary providers.

UDL adoption was not simultaneous nor motivated by a single driver. Institutions joined the project in staggered cohorts reflecting authentic system conditions:

Cohort A (project inception): sites with pre-existing inclusion or accessibility initiatives;
Cohort B (~ 6 months): sites aligning UDL with emerging policy or accreditation requirements;
Cohort C (~ 12 months): sites entering via faculty-led or department-level innovation.

Baseline UDL maturity varied from novice (limited awareness and ad hoc accommodations) to early adopters (pilot use of UDL-aligned practices) to advanced (coordinated use of UDL principles with some institutional support). Entry rationales, documented in site logs, included equity and inclusion strategies, compliance (policy/accreditation), instructional quality improvement, and technology-enabled pedagogy. This heterogeneity explains the variability observed in implementation fidelity trajectories and outcome profiles. Where relevant, analyses adjust for cohort and baseline maturity to enhance causal interpretability.

Research design and data collection procedures

The study was conducted over a 24-month period, during which data were collected and analyzed. As shown in Fig. 2, a mixed-methods approach was employed, with components developed in parallel. This means we gathered both numerical and detailed data simultaneously. Using this plan helps maintain the study’s timeline, which is crucial for understanding how UDL is implemented step by step.

Figure 2 identifies five key points to check, beginning with initial information gathering and revisiting every six months. These times were set to examine both the immediate results of starting and the lasting impact of UDL methods. Using this plan that takes into account time, we gain a better understanding of how things unfold at multiple stages, identifying key times when we might need extra support or a notable boost.

As illustrated in Fig. 2, the quantitative component of the study encompassed four core measurement domains, allowing for a comprehensive evaluation of both implementation processes and outcomes. Student achievement was assessed using standardized assessments for cross-context comparison, alongside course-specific performance indicators aligned with discipline-based learning objectives. Student engagement was evaluated using a combination of behavioral observation protocols and validated self-report instruments, capturing its multifaceted nature.

Fidelity of implementation was assessed through structured observations and detailed checklists, allowing for the evaluation of adherence to the three UDL principles: engagement, representation, and action/expression. The analysis of institutional factors focused on contextual elements, including resource availability, administrative support, and technological infrastructure, all of which are known to influence effective implementation. The qualitative aspect was a good background to complement and enlarge the quantitative results. The interviews with the faculty provided information on the implementation experience, identifying both successes and difficulties. UDL practices were studied in student focus groups to determine the extent to which they influenced accessibility and participation. Real-time instructional dynamics and social interactions were observed through ethnographic observations, and document analysis of institutional policies and training materials provided a deeper insight into the formal supports for implementation. The integration phase involved synthesizing results from both data strands using advanced analytical methods, such as joint display and meta-inference methods, which highlighted areas of agreement and disagreement. The data were also nested, and thus, multi-level modeling was used to examine the relationship between variables of implementation and outcomes at individual, classroom, and institutional levels. This combined discussion was used to inform the development of evidence-based frameworks and models, facilitating the future implementation of UDL in various educational settings.

Measures and instruments

The researchers employed a combination of well-established tools and innovative measures to capture the complex impact and implementation of Universal Design for Learning (UDL). Student performance was assessed using context-appropriate standardized assessments, course-specific performance indicators, and authentic assessment portfolios that reflected real-world learning outcomes.

To evaluate student engagement, the study utilized the Student Engagement Scale, behavioral observation protocols, and learning analytics gathered from educational technology platforms. Social belonging was measured using a modified version of the Psychological Sense of School Membership scale, tailored to fit diverse educational settings.

Implementation fidelity was assessed using a newly developed multidimensional framework that addressed structural, process, and outcome-level indicators. This framework combined systematic classroom observations, educator self-reports, and artifact analysis to evaluate the extent to which UDL principles—engagement, representation, and action/expression—were integrated into teaching practices. Instruments developed by the UDL Implementation and Research Network were also employed to assess faculty knowledge, attitudes, and confidence, supplemented by implementation confidence scales and knowledge-based assessments.

Institutional factors were measured through comprehensive organizational surveys that examined administrative support, availability of resources, professional development opportunities, and policy alignment. Evaluation of digital infrastructure and technology usage analytics revealed an upward trend in the integration of classroom technology by teacher educators.

Qualitative instruments included semi-structured interview protocols for faculty and administrators, focus group guidelines for student participants, and ethnographic observation frameworks. These tools captured the lived experiences of implementation, revealing the contextual factors and classroom processes that influenced the success of UDL strategies.

Measurement instruments and source citations

The Multi-Dimensional UDL Implementation Fidelity Framework (developed in this study; see Sect. 2.5 and Fig. 3) was operationalized via (a) structured classroom observation rubrics aligned to the three UDL principles; (b) artifact reviews (lesson plans, assessments, student work) to evidence structural and process indicators; and (c) student experience surveys indexing outcome fidelity (engagement, belonging, perceived access).

We administered instruments adapted from the UDL Implementation and Research Network (UDL-IRN) to assess knowledge of UDL principles, attitudes toward inclusive design, and confidence in implementation. Subscales covered engagement, representation, action/expression, and general inclusion beliefs; items used 5- or 7-point Likert formats with higher scores indicating more substantial alignment. Internal consistency for primary subscales met accepted thresholds (α and ω ≥ 0.80).

Engagement was measured via a validated student engagement scale (behavioral, emotional, and cognitive components), augmented by behavioral observation protocols and learning analytics (LMS interaction patterns). Social belonging was assessed using a modified Psychological Sense of School Membership instrument, adapted for both K–12 and postsecondary contexts. All instruments underwent psychometric checks (confirmatory factor analysis, measurement invariance across key subgroups) prior to primary analyses; reliability estimates (α, ω) were acceptable across waves.

Student achievement was evaluated through combined standardized assessments for cross-site comparability and course-embedded indicators aligned with discipline outcomes, harmonized to standard scales for analysis. For longitudinal modeling, repeated measures were standardized within site-wave.

Implementation fidelity framework

Among the most significant developments of the methodology used in this study was the creation of an extensive framework for implementation fidelity that extends beyond compliance measures to consider UDL implementation. Figure 3 outlines the structure of this framework, which builds upon the multidimensional implementation fidelity framework. This framework describes the relationship between the structural implementation of UDLs and their success in terms of process or outcome. According to Fig. 3, three dimensions relate to fidelity assessment, and they are closely related to each other, although they are distinct in nature. This dimension of the structural fidelity assesses the presence of UDL components. This involves determining whether there is a systematic assessment of the availability of different resources for learning and engaging students.

Figure 3 presents the Process Fidelity Dimension, which focuses on the quality of UDL implementation practices. The mere presence of UDL components does not guarantee effective execution. This dimension assesses whether instructional design practices promote coherent and meaningful learning experiences, and whether there is a structured approach to evaluating the thoroughness of applying UDL principles. It also includes a student-centered implementation assessment, which looks at the extent to which educators demonstrate responsive, individualized instruction that reflects an authentic understanding of learner variability.

Additionally, this dimension assesses ongoing instructional improvement, recognizing that high-quality implementation is not a static state. It emphasizes the importance of continuous teacher reflection and adaptation of strategies based on student performance and feedback.

The Fidelity of Outcomes Dimension, also illustrated in Fig. 3, examines whether the intended goals of UDL have been realized. High-quality implementation should result in measurable improvements in students’ experiences and learning outcomes. Academic achievement is evaluated to determine whether intended learning objectives are being met across diverse learner groups. Student engagement is assessed through both behavioral indicators and subjective self-reports of participation and motivation.

Finally, the inclusive outcomes assessment evaluates whether UDL practices contribute to equitable academic success and foster a sense of belonging, particularly among historically marginalized student populations. This ensures that implementation is not only practical but also equitable.

The measurement methods portion of Fig. 3 illustrates the comprehensive approach taken for data collection, enabling a reliable and valid assessment of all three fidelity dimensions. Multiple trained observers administer the structured rubrics used for classroom observations to enhance inter-rater reliability and ensure coverage of implementation practices. The analysis of the documents examines lesson plans, assessments, and student products to provide objective evidence of the implementation components. The quality of the components can also be assessed. Faculty self-assessment tools allow educators to share their perspectives on their efforts to implement while building decoration capacity to support continuous improvement. Surveys and focus groups that gather student experiences and learner voice evidence, indicating implementation effectiveness as seen by the recipient. The use of learning analytics yields objective measures of behaviour which supplement the observational and self-reports.

The fidelity scoring system, as represented in Fig. 3, enables the categorization of implementation quality across institutions to provide feedback for formative improvements and summative research purposes. The four-level system recognizes that effective UDL implementation exists on a continuum. It also recognizes that different contexts may require different approaches, as long as the core principles are maintained. The composite scoring method consistently combines all three fidelity dimensions. However, they can be weighted contextually to reflect the different implementation priorities and constraints across settings.

Effective implementation of UDL requires adherence to core principles and their adaptation to local contexts. The model integrates a set of measurement plans, including structured classroom observations with validated rubrics, evaluations of instructional resources and assessments, educator self-report tools, and student experience surveys. Each principle of UDL is evaluated using various indicators to reflect the differences in implementation across various situations and learner groups. It is essential to note that the framework acknowledges that high-fidelity implementation can be implemented in various ways, depending on the environment, while still adhering to the primary principles of Universal Design for Learning.

Data analysis procedures

The quantitative analysis was used as the initial stage of the explanatory mixed-methods study, in conjunction with a qualitative study. Such a design facilitated a multi-layered interpretation of the data, combining the two strands to provide a more detailed view of the findings. In the quantitative part, multilevel techniques were used to capture the nested nature of the data, which involved students within classrooms and classrooms within institutions. Hierarchical Linear Modeling (HLM) was used to investigate the association between the implementation fidelity of UDL and student outcomes, along with a control of both the individual and institutional level factors. Growth curve modeling was employed to assess longitudinal student achievement and engagement over 24 months.

Furthermore, latent profile analysis was conducted on implementation fidelity scores to identify distinct institutional patterns in the application of UDL. These profiles were subsequently incorporated into multilevel models to predict differential student outcomes based on specific implementation typologies. Effect sizes were calculated using Cohen’s conventions, with an emphasis on practical significance, especially given the large sample size. To strengthen causal inferences and reduce selection bias, propensity score matching was also employed.

Qualitative data were analyzed using both deductive and inductive coding methods. Initial codes were informed by UDL theory and literature on implementation science, while emergent themes were developed through constant comparative analysis. The NVivo software supported the coding process and facilitated pattern recognition. Inter-rater reliability was established with an agreement rate of over 85% across independently coded subsets. Quantitative and qualitative findings were synthesized using joint display techniques, meta-inference, and convergent synthesis to identify areas of alignment and divergence. Targeted follow-up analyses were conducted to explore inconsistencies, many of which revealed the nuanced complexity of implementation in real-world settings. The integrated analysis led to empirically grounded recommendations for UDL implementation, combining statistical rigor with a contextual understanding of educational practice.

Assumption checks and supplemental t-tests

In addition to the primary multilevel models (HLM) and growth curve analyses, prespecified pairwise contrasts were examined using independent-samples t-tests as complementary evidence. Prior to each t-test, distributional assumptions were evaluated: normality via Shapiro–Wilk tests, skewness/kurtosis diagnostics, and Q–Q plots; homogeneity of variances via Levene’s test. When variance homogeneity was violated, Welch’s unequal-variances t-test (Welch–Satterthwaite df) was used. If normality was seriously violated or influential outliers were detected, nonparametric sensitivity analyses using Mann–Whitney U and permutation/bootstrapped t-tests (10,000 resamples) were conducted. Given the nested design (students within classes; classes within institutions), supplemental t-tests were either (a) performed on cluster-level aggregates (e.g., class or institution means) or (b) accompanied by cluster-robust standard errors to mitigate inflated Type I error from within-cluster dependence. Where families of related comparisons were reported, Holm-adjusted p-values are provided as a conservative control of familywise error. Results from these supplemental tests were interpreted in conjunction with the HLM estimates to assess convergence and robustness.

Effect size reporting

For all pairwise comparisons, effect sizes are reported as Cohen’s d with Hedges’ g correction when group sizes were small or unbalanced; for pre–post designs summarized at the same level, \(\:{d}_{\text{a}\text{v}}\) (using the average SD) is provided. When nonparametric tests were primary or used in sensitivity analyses, Cliff’s delta (Δ) is reported. For multilevel models, we report standardized coefficients, semi-partial \(\:{R}^{2}\) for fixed effects, Snijders–Bosker \(\:{R}^{2}\) indices for within- and between-cluster variance explained, and the intraclass correlation coefficient (ICC) to quantify clustering. For longitudinal growth models, standardized mean-change indices and standardized slope differences are presented. 95% confidence intervals accompany all effect sizes; where clustering or non-normality warranted, CIs were obtained via cluster bootstrap. Interpretation follows conventional thresholds (small/medium/significant) while emphasizing practical significance in the context of UDL implementation and equity-relevant outcomes.

Qualitative analysis procedures and triangulation

Qualitative data (faculty/administrator semi-structured interviews, student focus groups, ethnographic observations, and institutional documents) were analysed using thematic analysis that combined deductive coding (guided by UDL principles, implementation science, and the ecological framework) with inductive identification of emergent patterns. An initial codebook was developed from theory and study questions, piloted on ~ 20% of transcripts, and refined through two calibration rounds. Two trained analysts, independent of field implementation, coded all materials while blinded to the sites’ fidelity classifications. Discrepancies were resolved through analytic discussion, with adjudication by a third reviewer as needed. Intercoder agreement exceeded 85% on independently coded subsets; Krippendorff’s alpha, which was targeted at > 0.80, was documented. Credibility was strengthened through data triangulation (interviews, focus groups, observations, documents), analyst triangulation (dual coders plus senior audit), member checking with key informants, reflexive memoing, and maintenance of an audit trail. Thematic saturation was declared when no substantively new codes/themes emerged across three successive data sources. Integrated mixed-methods interpretation employed joint displays to align qualitative themes with quantitative indicators, highlighting areas of convergence and divergence that informed the final implementation framework.

Validity and reliability considerations

To ensure the trustworthiness of both the methods and findings, the study employed a series of rigorous validation strategies. Quantitative instruments underwent a thorough psychometric evaluation, including confirmatory factor analysis to verify construct validity. Measurement invariance testing was also conducted across diverse participant groups to confirm the consistency of the instruments. The aspect of internal consistency reliability was measured using the alpha and omega coefficients, with all aspects demonstrating acceptable reliability. Moreover, the test-retest reliability was also checked at various times with the main variables. The study employed several rigorous methods to enhance the validity of the qualitative findings. These methods included multiple analyst triangulation, member checking with key informants, and lengthy participation in the research environment. The reliability of the interpretations was further enhanced by discussions with subject matter experts, which gave credence and validity to the analytical conclusions. It was also described in rich and detailed terms to help the reader understand how the findings can be applied to other contexts.

The negative case analysis method helped the researchers detect and discuss cases that do not follow the predominant patterns, ensuring a more nuanced and holistic interpretation. In addition to these methods of qualitative validation, the study also placed a significant focus on the quality of integration between its quantitative and qualitative aspects. The mixed-methods design was intended to provide a multidimensional approach to the research questions, aiming at convergence; however, it pursued complementarity instead. To further substantiate the validity of the study findings, the researchers thoroughly addressed the validity issues associated with each methodological element. They ensured that the interpretations of the methods were transparent and consistent. Such an integrative approach also significantly enhanced the credibility, depth, and practical relevance of the overall study findings.

Ethical considerations

The study protocol was approved by the Ethics Committee of Philippine Christian University prior to participant recruitment and data collection. The study was conducted in full accordance with all applicable ethical guidelines, including the principles of the Declaration of Helsink. All participants were strictly informed about the procedures. For students who are not yet 18 years old, written consent from both parents or legal guardians, as well as the consent of the minors themselves, is required. Sensitive populations, such as students with disabilities, were given special protection so that their rights and well-being were not violated. The consent forms were distributed in various languages and in convenient formats to suit the interests of every participant. The participants were told that they could withdraw at any time without repercussions. All personally identifiable data were removed from the datasets to ensure privacy and confidentiality. All materials were stored securely in accordance with institutional guidelines and national data protection laws. The identities of the participants were maintained in the qualitative reporting through the use of pseudonyms, ensuring coherence in the narrative. Special focus was placed on the confidentiality of faculty participants, as implementation practices are a sensitive professional issue.

The principle of reciprocity guided the study, ensuring a mutually beneficial research process that was also ethical. The participating institutions received personalized reports on their adoption of UDL, along with evidence-based recommendations tailored to their specific needs. Secondly, faculty participants received professional development workshops based on the study’s findings, which facilitated both the development of instruction and the acknowledgement of their roles in the study. The purpose of this study was to minimize the burden on participants and maximize the benefits for educational communities participating in the study. The methodology of the research developed by the Consortium was a rigorous and thorough analysis of how UDL is implemented in a wide variety of educational institutions. This method ensured a high level of validity and reliability in its performance, which ensured both scientific and practical relevance. Through this inclusive and integrative model, both theoretical and practical results were achieved in the study, enabling educators, practitioners, and policymakers to enhance accessibility and student achievement with the aid of successful UDL practices.

link

Implementation fidelity of universal design for learning and effects on student achievement engagement and belonging