Policy analysis combining artificial intelligence and text mining technology in the perspective of educational informatization

Performance comparison of the educational policy text analysis model
The dataset used in this study is Common Crawl, an open web archive project that regularly crawls billions of web pages. Although Common Crawl itself is not specifically tailored for educational policy texts, its vast dataset covers a wide range of topics, including education. By using specific keywords (such as “education policy,” “educational reform,” etc.), texts related to educational policies are extracted. The dataset can be downloaded from the official website ( The experimental environment is described in Table 4.
To ensure the accuracy of the experimental results, the model parameters are uniformly configured. The hidden layer size is set to 512 to balance the model’s complexity and processing capability. The batch size is set to 16, and the learning rate is set to 0.01 with an initial learning rate of 0.001, utilizing a learning rate decay strategy. The comparative models chosen for the experiment are Generative Pre-training Transformer-4 (GPT-4) and robustly optimized BERT approach (RoBERTa). Selecting GPT-4 and RoBERTa as benchmark models provides a strong reference for the proposed optimization model in this study. GPT-4 represents the cutting edge of generative and understanding capabilities, while RoBERTa, as an optimized version of BERT, offers robust text comprehension performance. The task completed in the experiment is text classification, where the goal is to classify given educational policy texts into different categories (such as educational equity, enhancement of educational quality, and technology application). This allows for the assessment of the algorithm’s accuracy in distinguishing between text categories. The models are evaluated based on accuracy, precision, recall, and F1 score to assess their effectiveness in classifying various types of texts. The experimental results are depicted in Fig. 3.

a Accuracy; b Precision; c Recall; d F1 score.
In Fig. 3, regarding accuracy, the optimized model in this study achieves an accuracy of 0.756 with a dataset size of 100, outperforming RoBERTa and approaching GPT-4, indicating its excellent performance with smaller datasets. When the dataset size increases to 1000, the accuracy of the optimized model significantly rises to 0.887, surpassing GPT-4, demonstrating its considerable advantage in handling medium-sized data. With a dataset size of 10,000, the accuracy of the optimized model further improves to 0.910, clearly outperforming the other two models and showing exceptional performance in large data environments. The optimized model exhibits high accuracy across all dataset sizes, especially excelling in larger datasets compared to GPT-4 and RoBERTa. This indicates that the combination and optimization of AI and BERT enhance the model’s semantic understanding and classification capabilities in educational policy text analysis, enabling it to better handle datasets of various sizes. In terms of precision, with a dataset size of 100, GPT-4 achieves a precision of 0.751, reflecting its robust performance with smaller datasets. When the dataset size increases to 1000, GPT-4’s precision slightly drops to 0.732, which may reflect specific characteristics or challenges in processing medium-sized datasets. With a dataset size of 10,000, GPT-4’s precision significantly increases to 0.877, indicating strong capabilities in large-scale data processing. RoBERTa’s precision with a dataset size of 100 is 0.713, relatively stable but slightly lower than GPT-4. As the dataset size grows to 1000, RoBERTa’s precision slightly declines to 0.701, suggesting challenges in handling medium-sized data. With a dataset size of 10,000, RoBERTa’s precision improves to 0.748, though it still lags behind the other models, especially in large data environments. The precision of the optimized model with a dataset size of 100 is 0.748, comparable to GPT-4, indicating its competitiveness with smaller datasets. When the dataset size increases to 1000, the precision of the optimized model significantly improves to 0.810, surpassing the other two models and demonstrating superior performance with medium-sized data. With a dataset size of 10,000, the precision of the optimized model further increases to 0.882, slightly higher than GPT-4, showcasing its exceptional performance in large-scale data processing. Regarding recall, the optimized model achieves recalls of 0.796, 0.879, and 0.870 for different dataset sizes. Its performance with medium-sized datasets is significantly better than the other models. This indicates that through the integration and optimization of AI and BERT, the model is better at capturing relevant information and covering a broader range in educational policy text analysis. In comparison, although GPT-4 performs exceptionally well in large data environments, its recall rate shows only modest improvement with medium and small datasets, suggesting potential limitations in certain scenarios. RoBERTa generally performs slightly worse across all dataset sizes, particularly with smaller datasets where its recall rate is notably lower, indicating limitations in coverage. Regarding F1 score, the optimized model achieves an F1 score of 0.865 with a dataset size of 100, significantly outperforming both GPT-4 and RoBERTa and demonstrating its strong overall capability with small datasets. As the dataset size increases to 1000, the F1 score of the optimized model rises further to 0.907, showcasing outstanding performance with medium-sized datasets and surpassing the other two models. With a dataset size of 10,000, the F1 score of the optimized model continues to improve to 0.921, demonstrating significant advantages in large-scale data environments and maintaining exceptional overall performance.
Case study based on the analysis of educational policy text models
To delve deeper into the application of AI technology and text mining technology in educational policy analysis, a representative educational policy text, the “National Education Informatization 2.0 Action Plan,” is selected. This policy aims to promote the development of educational informatization and improve educational quality and efficiency. Through this case study, the study demonstrates how AI and text mining technology can be used to conduct in-depth analysis of policy texts, uncover their potential effects and existing issues, and discuss them from a psychological perspective. The unit of analysis is the paragraph, as paragraphs, being complete units of thought, better represent the logical structure, core ideas, and key arguments within policy texts. The keyword analysis of policy texts is shown in Table 5.
In Table 5, the theme of “Technology Integration” (mentioned 43 times, with a positive sentiment) is highlighted the most, indicating that the policy text places significant emphasis on using technology to enhance the quality of education. “Development of Educational Resources” (mentioned 37 times, with a positive sentiment) and “Construction of Online Learning Platforms” (mentioned 21 times, with a positive sentiment) are also key areas of focus in the policy, showing that policymakers aim to enhance the accessibility of educational resources and diversify learning methods through these measures. However, the discussion on “Educational Equity” (mentioned 15 times, with a neutral sentiment) reveals that while the policy text mentions the intention to promote educational equity, it lacks specific implementation measures, which may affect the realization of policy objectives. The analysis of the psychological impact of education policy on stakeholders is presented in Table 6.
In Table 6, the expected and actual impact data for the student dimension are primarily obtained through the analysis of student performance and feedback data. The teacher dimension data are derived from the evaluation of teaching outcomes. The expected and actual impact data for the parent dimension are gathered from parental feedback and usage data from the platform. For the school administrator dimension, the expected and actual impact data are analyzed based on feedback on educational resource allocation efficiency and data on resource utilization after policy implementation. The policy has a significant positive impact on students’ learning motivation (expected impact: high, actual impact: high), which may be related to the widespread use of online learning platforms and abundant educational resources. For teachers, although there is some improvement in job satisfaction (expected impact: high, actual impact: moderate), there still exists some pressure, possibly due to the need to adapt to new technologies and teaching methods. Regarding parents, the increase in the convenience of information access does not significantly enhance parental involvement in education (actual impact: low), indicating the need for further consideration on how to effectively promote home-school cooperation in policy implementation. The usage of online learning platforms and student psychological state survey are presented in Table 7.
Table 7 shows that although most students use online learning platforms daily (85% use them every day), and the majority indicate an increase in learning motivation (71% reported improvement) and learning efficiency (66% reported improvement), the increase in learning pressure (39% reported an increase) is also a significant concern. This data indicates that while online learning offers flexibility and rich resources, it also brings additional pressure to some students, particularly those with weaker self-regulation abilities. Therefore, policymakers and educators need to consider how to provide appropriate support and intervention measures to help students cope with the challenges that online learning may bring.
The analysis of the “National Education Informatization 2.0 Action Plan” using AI technology and text mining techniques reveals the key themes and potential impacts of the policy, as well as identifying some existing issues and challenges. Specifically, analyzing the psychological impact of the policy on stakeholders provides important aspects to consider in policy optimization and implementation. Future work should further deepen the analysis of education policies, particularly focusing on achieving educational equity, enhancing teacher job satisfaction, and optimizing the online learning environment to promote the effective implementation and continuous improvement of education informatization policies.
link