An Analytical System for Lifelong Learning Achievements: Integrating EDP-Means Clustering and Edu-ETL Processes

Mhon, Gant Gaw Wutt

An Analytical System for Lifelong Learning Achievements: Integrating EDP-Means Clustering and Edu-ETL Processes

Mhon, Gant Gaw Wutt

URI: https://onlineresource.ucsy.edu.mm/handle/123456789/2814

Date: 2024-06

Abstract:

In the field of education, analyzing academic performance is vital for understanding student learning behaviors, identifying areas needing enhancement, and developing targeted interventions to improve educational outcomes. Traditional assessment methods typically depend on simple metrics like grades or standardized test scores; which often fail to capture the complexities of student proficiency and behavior. To overcome these limitations, educational researchers have increasingly adopted advanced data mining techniques and machine learning algorithms for a more granular and comprehensive analysis of academic performance data. This research proposes an Enhanced Dirichlet Process Means (EDP-Means) clustering algorithm combined with Educational Extract, Transform, Load (Edu-ETL) processes to evaluate academic performance across various educational levels. The proposed approaches aim to offer greater assurance and clarity in evaluating and supporting student achievements throughout their educational journey. The integration of Edu- ETL processes ensures data quality and consistency, preparing educational datasets for thorough analysis. The architecture of the proposed system utilizes the EDP- Means clustering algorithm, an improvement over the original DP-Means, for enhanced clustering performance. While both algorithms assign data points to clusters based on distance and threshold, EDP-Means introduces iterative optimization steps for improved accuracy and stability. In the original DP-Means algorithm, the number of clusters and the threshold parameter were typically fixed or set based on heuristic choices. In EDP-Means, these parameters are dynamically adjusted based on the data characteristics and clustering quality, leading to more accurate and reliable clustering results. This study demonstrates that EDP-Means performs better and is comparable to traditional K-Means and original DP-Means algorithms in clustering educational data. To validate and prove the performance of EDP-Means, datasets from different fields were used to further experiment EDP-Means and ensure its effectiveness. Furthermore, the analysis of the PySpark environment underscores how the utilization of PySpark enhances the scalability and efficiency of EDP-Means, particularly in processing large-scale datasets.

Show full item record