Accepted Papers
Full Research Papers
**Full Conference Program will be posted soon**
*Please note these titles and abstracts may be subject to change as they are listed with pre-publication information. Any changes to titles and/or abstracts will be updated soon.
Authors | Title | Abstract |
Jeroen Ooge, Arno Vanneste, Maxwell Szymanski and Katrien Verbert | Designing Visual Explanations and Learner Controls to Engage Adolescents in AI-Supported Exercise Selection | E-learning platforms that personalise content selection with AI are often criticised for lacking transparency and controllability. Researchers have therefore proposed solutions such as open learner models and letting learners select from ranked recommendations, which engage learners before or after the AI-supported selection process. However, little research has explored how learners - especially adolescents - could engage during such AI-supported decision-making. To address this open challenge, we iteratively designed and implemented a control mechanism that enables learners to steer the difficulty of AI-compiled exercise series before practice, while interactively analysing their control's impact in a 'what-if' visualisation. We evaluated our prototypes through four qualitative studies involving adolescents, teachers, EdTech professionals, and pedagogical experts, focusing on different types of visual explanations for recommendations. Our findings suggest that 'why' explanations do not always meet the explainability needs of young learners but can benefit teachers. Additionally, 'what-if' explanations were well-received for their potential to boost motivation. Overall, our work illustrates how combining learner control and visual explanations can be operationalised on e-learning platforms for adolescents. Future research can build upon our designs for 'why' and 'what-if' explanations and verify our preliminary findings. |
Ilya Musabirov, Mohi Reza, Haochen Song, Steven Moore, Pan Chen, Harsh Kumar, Tong Li, John Stamper, Norman Bier, Anna Rafferty, Thomas Price, Nina Deliu, Audrey Durand, Michael Liut and Joseph Jay Williams | Platform-based Adaptive Experimental Research in Education: Lessons Learned from Learning Challenge | Adaptive Experimentation is one of the most promising approaches to support complex decision-making in learning experience design and delivery. This paper reports on our experience with a real-world, multi-experimental evaluation of an adaptive experimentation platform within an Anonymous Learning Challenge framework, and summarizes data-driven lessons learned and best practices for Adaptive Experimentation in education. We outline the key scenarios of the applicability of platform-supported experiments and reflect on lessons learned from this two-year project, focusing on implications relevant to platform developers, researchers, practitioners, and policy stakeholders to integrate adaptive experiments in real-world courses. |
Grace D Jaiyeola, Aaron Wong, Richard Bryck, Caitlin Mills and Stephen Hutt | One Size Does Not Fit All: Considerations when using Webcam-Based Eye Tracking to Models of Neurodivergent Learners' Attention and Comprehension | This study investigates the use of webcam-based eye tracking to model attention and comprehension in both neurotypical and neurodivergent learners. Leveraging the WebGazer, a previously used online data collection tool, we collected gaze and interaction data (N=354) during online reading tasks to explore Task Unrelated Thought (TUT) and comprehension in an ecologically valid setting. Our findings challenge the "one size fits all" approach to learner modeling by demonstrating distinct differences in indicators of both constructs between neurotypical and neurodivergent learners. We compared general models trained on the entire population with tailored models specific to neurodivergent and neurotypical groups. Results indicate that diagnosis-specific models provide more accurate predictions (AUROC's .59-.70 vs. .57 for the general model), and through SHAPley analysis, we note that the strongest indicators of each construct vary as the training population is refined, highlighting the limitations of generalized approaches. This work supports the scalability of webcam-based cognitive modeling and underscores the potential for personalized learning analytics and modeling to better support diverse learning needs. |
Bowen Hui, Opey Adeyemi, Kiet Phan, Justin Schoenit, Seth Akins and Keyvan Khademi | Diversity Considerations in Team Formation Design, Algorithm, and Measurement | Building teams that foster equitable interaction provides the foundation for a positive collaborative learning experience. Existing literature shows that many context-specific algorithms exist to help instructors form teams automatically in large classes, but the field lacks general guidelines for selecting a suitable algorithm in a given pedagogical context and lacks a general evaluation approach that allows for the methodological comparison of these algorithms. This paper presents a general-purpose team formation algorithm that considers diversity and inclusion in its design. We also describe an evaluation framework with diversity metrics to assess team compositions using synthetically generated student data and real class data. Our simulation and classroom experiments show that our algorithm performs competitively against three state-of-the-art algorithms. We hope this work contributes to building a more equitable and collaborative learning environment for students. |
Stella Xin Yin, Zhengyuan Liu, Dion Hoe-Lian Goh, Choon Lang Quek and Nancy F. Chen | Scaling Up Collaborative Dialogue Analysis: An AI-driven Approach to Understanding Dialogue Patterns in Computational Thinking Education | Pair programming is a collaborative activity that enhances students' computational thinking (CT) skills. While analyzing students' interactions provides valuable insights into effective collaboration, prior studies have relied heavily on manual transcription and coding. Recent advancements in speech and language processing offer opportunities to automate and scale up the analysis of classroom dialogues. Besides, previous work mainly focused on task-related interactions, with little attention to social interactions. To address these gaps, we conduct a four-week CT course with 26 fifth-grade primary school students, who work in pairs to solve CT tasks. We record their discussions and transcribed them with a speech processing pipeline. Next, we develop a coding scheme and applied large language models for annotation. After identifying the dialogue patterns, we investigate the impact of these patterns on CT performance. Our study demonstrates potential for leveraging AI technologies to analyze classroom dialogues. Four clusters of dialogue patterns have been identified. We observed that Inquiry and Constructive Collaboration patterns positively influenced students' CT skills, while Disengagement and Disputation patterns were associated with lower CT performance. This study contributes to the understanding of how dialogue patterns relate to CT performance and provides implications for both research and educational practice in CT education. |
Julie Le Tallec, Ethan Prihar and Tanja Käser | TeamTeachingViz: Benefits, Challenges, and Ethical Considerations of Using a Multimodal Analytics Dashboard to Support Team Teaching Reflection | Team teaching in higher education can be challenging, especially for educators managing large classes with limited pedagogical training and few opportunities to reflect on their practices. Emerging sensing technologies and analytics can capture and analyse patterns of collaboration, communication, and movement of team teaching. Yet, few studies have presented these data to educators for reflection. To address this gap, we examine the benefits, challenges, and concerns of presenting multimodal teaching data (positional, audio, and spatial pedagogy observations) to educators via the TeamTeachingViz dashboard. We evaluated TeamTeachingViz in an authentic classroom context where educators explored their own data and team teaching strategies. Multimodal data was collected from 36 in-the-wild classroom sessions involving 12 educators grouped in various combinations over 4 weeks, followed by semi-structured interviews to reflect on their practices. Findings suggest that educators improved their self-awareness by using data-driven insights to understand their movements and interactions, enabling continuous improvement in team teaching. However, they noted the need for additional data, such as student behaviours and speech content, to better contextualise these insights |
Lixiang Yan, Dragan Gasevic, Vanessa Echeverria, Yueqiao Jin, Linxuan Zhao and Roberto Martinez-Maldonado | From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative Learning | Multimodal Learning Analytics (MMLA) leverages advanced sensing technologies and artificial intelligence to capture complex learning processes, but integrating diverse data sources into cohesive insights remains challenging. This study introduces a novel methodology for integrating latent class analysis (LCA) within MMLA to map monomodal behavioural indicators into parsimonious multimodal ones. Using a high-fidelity healthcare simulation context, we collected positional, audio, and physiological data, deriving 17 monomodal indicators. LCA identified four distinct latent classes: Collaborative Communication, Embodied Collaboration, Distant Interaction, and Solitary Engagement, each capturing unique monomodal patterns. Epistemic network analysis compared these multimodal indicators with the original monomodal indicators and found that the multimodal approach was more parsimonious while offering higher explanatory power regarding students' task and collaboration performances. The findings highlight the potential of LCA in simplifying the analysis of complex multimodal data while capturing nuanced, cross-modality behaviours, offering actionable insights for educators and enhancing the design of collaborative learning interventions. This study proposes a pathway for advancing MMLA, making it more parsimonious and manageable, and aligning with the principles of learner-centred education. |
Sylvio Rüdian, Julia Podelo, Jakub Kužílek and Niels Pinkwart | Feedback on Feedback: Student's Perceptions for Feedback from Teachers and Few-Shot LLMs | Large language models (LLMs) can be a valuable resource for generating texts and performing various instruction-based tasks. In this paper, we explored the use of LLMs, particularly for generating feedback for students in higher education. More precisely, we conducted an experiment to examine students' perceptions regarding LLM-generated feedback. This has the overall aim of assisting teachers in the feedback creation process. First, we examine the different student perceptions regarding the feedback that students got without being aware of whether it was created by their teacher or an LLM. Our results reveal that the feedback source has not impacted how it was perceived by the students, except in cases where repetitive content has been generated, which is a known limitation of LLMs. Second, students have been asked to identify whether the feedback comes from an LLM or the teacher. The results demonstrate, that students were unable to identify the feedback source. A small subset of indicators has been identified, that clearly revealed from whom the feedback comes from. Third, student perceptions are analyzed while knowing that feedback has been auto-generated. This examination indicates that generated feedback is likely to be met with resistance and emphasizes the need of a teacher-in-the-loop approach. |
Rafael Ferreira Mello, Cleon Pereira Junior, Luiz Rodrigues, Filipe Dwan Pereira, Luciano Cabral, Newarney Costa, Geber Ramalho and Dragan Gasevic | Automatic Short Answer Grading in the LLM Era: Does GPT-4 with Prompt Engineering beat Traditional Models? | Assessing short answers in educational settings is challenging due to the need for scalability and accuracy, which led to the field of Automatic Short Answer Grading (ASAG). Traditional machine learning models, such as ensemble and embeddings, have been widely researched in ASAG, but they often suffer from generalizability issues. Recently, Large Language Models (LLMs) emerged as an alternative to optimize ASAG systems. However, previous research has failed to present a comprehensive analysis of LLMs' performance powered by prompt engineering strategies and compare its capabilities to traditional models. This study presents a comparative analysis between traditional machine learning models and GPT-4 in the context of ASAG. We investigated the effectiveness of different models and text representation techniques and explored prompt engineering strategies for LLMs. The results indicate that traditional machine learning models outperform LLMs. However, GPT-4 showed promising capabilities, especially when configured with optimized prompt components, such as few-shot examples and clear instructions. This study contributes to the literature by providing a detailed evaluation of LLM performance compared to traditional machine learning models in a multilingual ASAG context, offering insights for developing more efficient automatic grading systems. |
Manika Garg and Anita Goel | Towards Fair Assessments: A Machine Learning-based Approach for Detecting Cheating in Online Assessments | Academic cheating poses a significant challenge to conducting fair online assessments. One common way is collusion, where students unethically share answers during the assessment. While several researchers proposed solutions, there is lack of clarity regarding the specific types they target among the five types of collusion. Researchers have used statistical techniques to analyze basic attributes collected by the platforms, for collusion detection. Only few works have used machine learning, considering two or three attributes only; the use of limited features leading to reduced accuracy and increased risk of false accusations. In this work, we focus on In-Parallel Collusion, where students simultaneously work together on an assessment. For data collection, a quiz tool is improvised to capture clickstream data at a finer level of granularity. We use feature engineering to derive seven features and create a machine learning model for collusion detection. The results show: 1) Random Forest exhibits the best accuracy (98.8%), and 2) In contrast to less features as used in earlier works, the full feature set provides the best result; showing that considering multiple facets of similarity enhance the model accuracy. The findings provide platform designers and teachers with insights into optimizing quiz platforms and creating cheat-proof assessments. |
Elias Hedlin, Ludwig Estling, Jaqueline Wong, Carrie Demmans Epp and Olga Viberg | Got It! Prompting Readability Using ChatGPT to Enhance Academic Texts for Diverse Learning Needs | Reading skills are crucial for students' success in education and beyond. However, reading proficiency among K-12 students has been declining globally, including in Sweden, leaving many underprepared for post-secondary education. Additionally, an increasing number of students have reading disorders, such as dyslexia, which require support. Generative artificial intelligence (genAI) tech-nologies, like ChatGPT, may offer new opportunities to improve reading practices by enhancing the readability of educational texts. This study investigates whether ChatGPT-4 can simplify academic texts and which prompting strategies are most effective. We tasked ChatGPT to re-write 136 academic texts using four prompting approaches: Standard, Meta, Roleplay, and Chain-of-Thought. All strategies improved text readability, with Meta performing the best overall and the Standard prompt sometimes creating texts that were less readable than the original. This study found variability in the simplified texts, suggesting that different strategies should be used based on the specific needs of individual learners. Overall, the findings highlight the potential of genAI tools, like ChatGPT, to improve the accessibility of academic texts, offering valuable support for students with reading difficulties and promoting more equitable learning opportunities. |
Valdemar Švábenský, Conrad Borchers, Elizabeth B. Cloude and Atsushi Shimada | Evaluating the Impact of Data Augmentation on Predictive Model Performance | In supervised machine learning (SML) research, large training datasets are essential for valid results. However, obtaining primary data in learning analytics (LA) is challenging. Data augmentation can address this by expanding and diversifying the data, though its use in LA remains underexplored. This paper systematically compares data augmentation techniques and their impact on prediction performance in a typical LA task: prediction of academic outcomes. Augmentation is demonstrated on four SML models, which we successfully replicated from a previous LAK study, achieving AUCs equivalent to the original. Among 21 augmentation techniques, SMOTE-ENN sampling performed the best, speeding up the training to 55% and improving the average AUC by 0.01 from the baseline models. In addition, we compared 99 combinations of chaining 21 techniques, and found minor (+0.014) improvements across models when adding noise to SMOTE-ENN. Nevertheless, we also caution that some augmentation techniques significantly lowered predictive performance or increased performance fluctuation related to random chance. This paper's contribution is twofold. Primarily, our empirical findings show that sampling techniques may provide the most tangible performance improvements for LA applications of SML. Second, the LA community may benefit from validating a recent study through independent replication. |
Tianyuan Yang, Baofeng Ren, Chenghao Gu, Boxuan Ma, Tianjia He and Shin'Ichi Konomi | Towards Better Course Recommendations: Integrating Multi-Perspective Meta-Paths and Knowledge Graphs | Course recommender systems demonstrate their potential in assisting students with course selection and effectively alleviating the problem of information overload. Current course recommender systems focus predominantly on collaborative information and fail to consider the multi-perspective information and the bi-directional relationship between students and courses. This paper introduces a novel Multi-perspective Aware Explainable Course Recommendation model (MAECR) that leverages knowledge graphs and multi-perspective meta-paths to enhance both the accuracy and explainability of course recommendations. By the dual-side modeling from both the student and the course for each meta-path, MAECR can identify and understand the interests and needs of students in each course, as well as evaluate the attractiveness and suitability of the courses for individual students. Following the dual-side modeling for each meta-path, we aggregate multi-perspective meta-paths of each student and course using a carefully designed attention mechanism. The attention weights generated by this mechanism serve as explanations for the recommendation results, representing the preference score for each perspective. MAECR thus provides personalized and explainable recommendations. Comprehensive experiments are implemented to demonstrate the effectiveness and improved interpretability of the proposed model. |
Ziqing Li, Mutlu Cukurova and Sahan Bulathwela | A Novel Approach to Scalable and Automatic Topic-Controlled Question Generation in Education | The development of Automatic Question Generation (QG) models has the potential to significantly improve educational practices by reducing the teacher workload associated with creating educational content. This paper introduces a novel approach to educational question generation that controls the topical focus of questions. Topic-Controlled Question Generation (T-CQG) thereby enhances the relevance and effectiveness of the generated content for educational purposes. Our approach uses fine-tuning on a pre-trained T5-small model, employing specially created datasets to cater to educational needs. We specifically address the challenge of generating semantically aligned questions with paragraph-level contexts enhancing the topic specificity of the generations. In addition, we introduce and explore novel evaluation methods to assess the topical relatedness of the generated questions. Our results, validated through rigorous offline and human-backed evaluations, demonstrate that the proposed 2models effectively generate high-quality, topic-focused questions that have the potential to reduce teacher workload as well as support personalised tutoring systems as a bespoke question generator. With its relatively small number of parameters, the proposals not only advance the capabilities of question generation models for handling specific educational topics but also offer a scalable solution that reduces infrastructure costs. |
Insub Shin, Subhin Hwang, Yunjoo Yoo, Sooan Bae and Raeyeong Kim | Comparing Student Preferences for AI-Generated and Peer-Generated Feedback in AI-driven Formative Peer Assessment | Formative assessment has the advantage of enhancing student learning and improving teaching practices through evaluation. However, there are practical obstacles, such as time constraints and students' passive participation and the low quality of peer feedback can also be an issue. Artificial intelligence (AI) has been explored for its potential to automate grading and provide timely feedback, making it a valuable tool in formative assessment. However, there is still limited research on how AI can be effectively used in the context of formative peer assessment. In this study, we conducted an AI-driven formative peer assessment with 108 high school students, where AI actively participated in the peer evaluation process. Trace data were systematically collected, along with dispositional data gathered through self-report surveys. The collected data was used to analyze the differences in preference between AI-generated and peer-generated feedback. In scenarios where student participation was low or the quality of peer feedback was insufficient, students showed a higher preference for AI-generated feedback, demonstrating its potential utility. However, students with high Math Confidence and AI Interest preferred for peer-generated feedback to AI. Based on these findings, we will propose practical strategies for implementing AI-generated formative peer assessment. |
Mohammad Khalil and Paul Prinsloo | The lack of generalisability in learning analytics research: why, how does it matter, and where to? | Concerns about the lack of impact of learning analytics (LA) research has been part of the evolution of the field since its emergence as a research focus and practice in 2011. The preponderance of small-scale and exploratory nature of much of LA research are well-documented as contributing factors to the lack of generalisability, transferability, replicability and scalability in LA research. Through an analysis of 144 full research papers published in the conference proceedings of LAK '22, 23 and 24, this paper provides an overview of the extent and contours of the lack of generalisability in LA research and pointers for making LA research more generalisable. The inductive and deductive analysis of the recent three LAK conferences provide evidence that a significant percentage (46%) of the corpus papers do not refer at all to generalisability or transferability, while few papers report on the scalability of their research findings. While the crisis of replicability/reproducibility is a wider concern in the broader context of research, considering and reporting on generalisability and transferability is integral to the scientific rigour. We conclude our paper with a range of pointers for addressing the lack of generalisability in LA research. |
Alex Barrett, Fengfeng Ke, Nuodi Zhang, Chih-Pu Dai, Saptarshi Bhowmik and Xin Yuan | Pattern analysis of ambitious science talk between preservice teachers and AI-powered student agents | New frontiers in simulation-based teacher training have been unveiled with the advancement of artificial intelligence (AI). Integrating AI into virtual student agents increases the accessibility and affordability of teacher training simulations, but little is known about how preservice teachers interact with AI-powered student agents. This study analyzed the discourse behavior of 15 preservice teachers who undertook simulation-based training with AI-powered student agents. Using a framework of ambitious science teaching, we conducted a pattern analysis of teacher and student talk moves, looking for evidence of academically productive discourse. Comparisons are made with patterns found in real classrooms with professionally trained science teachers. Results indicated that preservice teachers generated academically productive discourse with AI-powered students through using ambitious talk moves. The pattern analysis also revealed coachable moments where preservice teachers succumbed to cycles of unproductive discourse. This study highlights the utility of analyzing classroom discourse to understand human-AI communication in simulation-based teacher training. |
Qinyi Liu, Ronas Shakya, Mohammad Khalil and Jelena Jovanovic | Advancing privacy in learning analytics using differential privacy | This paper addresses the challenge of balancing learner data privacy with the use of data in learning analytics (LA) by proposing a novel framework by applying Differential Privacy (DP). The need for more robust privacy protection keeps increasing, driven by evolving legal regulations and heightened privacy concerns, as well as traditional anonymization methods being insufficient for the complexities of educational data. To address this, we introduce the first DP framework specifically designed for LA and provide practical guidance for its implementation. We demonstrate the use of this framework through a LA usage scenario and validate DP in safeguarding data privacy against potential attacks through an experiment on a well-known LA dataset. Additionally, we explore the trade-offs between data privacy and utility across various DP settings. Our work contributes to the field of LA by offering a practical DP framework that can support researchers and practitioners in adopting DP in their works. |
Jaclyn Ocumpaugh, Nidhi Nasiar, Andres Felipe Zambrano, Alex Goslen, Jessica Vandenberg, Jordan Esiason, Jonathan Rowe and Stephen Hutt | Refocusing the lens through which we view affect dynamics: The Skills, Difficulty, Value, Efficacy and Time Model | For more than a decade, a handful of theoretical models have shaped a substantial amount of the research related to students' emotional experiences during learning. This research has been productive, but articulating the underlying implicit assumptions in existing theories and their implications in our empirical interpretations can help to better investigate the reciprocal relationships between learning and emotion, and subsequently, to develop better interventions. This paper expands upon the existing theoretical frameworks, increasing the types of questions we ask about affect dynamics. We do so within the context of [REDACTED] a virtual world that allows middle school students to investigate microbiology questions. Specifically, we use this data to examine and revise the assumptions that are implicit in these models and the methods we use to investigate them. |
Jaclyn Ocumpaugh, Xiner Liu and Andres Felipe Zambrano | Language Models and Dialect Differences | The advancements in automatic language processing being ushered in by Large Language Models suggest enormous potential for better personalization during student learning. However, this potential can be best exploited if we know that LLMs are equally capable of interacting with students who speak or write in a range of different dialects. This case study uses systematically manipulated student essays, previously evaluated by human raters, to examine how ChatGPT responds to and addresses specific dialect differences. Results point to important research questions about the potential biases and limitations of both LLMs and human when evaluating and providing feedback to students who use minoritized dialects. Addressing these concerns is critical for the field of learning analytics to address as it seeks to ensure equity and asset-based approaches to learning analytics. |
Zheng Fang, Weiqing Wang, Guanliang Chen and Zachari Swiecki | The Company You Keep: Refining Neural Epistemic Network Analysis | Collaborative problem-solving (CPS) is defined as an inherently sociocognitive phenomena. Despite this, extant learning analytic techniques tend to focus on either the social or cognitive aspects without explicitly considering their interaction. Prior work developed Neural Epistemic Network Analysis (NENA), which used a combination of deep learning methods to simultaneously model the social and cognitive aspects of CPS; however, the method had several limitations. The refined version of NENA presented here addresses these limitations by (a) introducing a simplified autoencoder deep learning architecture; (b) using a combination of social and epistemic networks as input to preserve interpretability in terms of social and cognitive factors; and (c) introducing an isometry loss function to ensure downstream statistical tests are meaningful. We found that the refined version of NENA is able to achieve high performance on criteria we would expect from a network analytic technique in the context of learning analytics: interpretability, goodness of fit, orthogonality and isometry; and discriminatory power. We also demonstrated that this method was comparable in performance to a more traditional learning analytic technique, Epistemic Network Analysis (ENA), while providing information that ENA did not. |
Melissa Lee, Kevin Huang, Kelly Collins and Mingyu Feng | Examining the Relationship between Math Anxiety, Effort, and Learning Outcomes Using Latent Class Analysis | Math anxiety has been found to negatively correlate with math achievement, affecting students' choices to take fewer math classes and avoid math educational opportunities. Educational technology tools can ameliorate some of the negative effects of math anxiety. Here, we examined students' math anxiety, effort in an educational technology platform, and resulting math achievement. This study used multilevel latent class analysis to understand student profiles of math anxiety and examined how students of different profiles interacted with [Program], an adaptive intelligent tutor that provides affective support to students during math problem solving, with a focus on the effort students exerted when using [Program], e.g., giving up, skipping, using hints. We examined students' learning outcomes on a standardized math assessment. Our analysis indicates that students were in one of three groups: Highly Anxious, Performance Anxious, and Calm. Highly Anxious students tended to give up more often when solving questions in [Program] and had the lowest math achievement outcomes. For these students, using hints to solve problems in [Program] was significantly associated with increased math outcomes. This suggests that for students with the highest levels of math anxiety, encouragement to use hints in educational technology programs could be related to improved learning outcomes. |
Zhangqi Duan, Nigel Fernandez, Alexander Hicks and Andrew Lan | Test Case-Informed Knowledge Tracing for Open-ended Coding Tasks | Open-ended coding tasks, which ask students to construct programs according to certain specifications, are common in computer science education. Student modeling can be challenging since their open-ended nature means that student code can be diverse. Traditional knowledge tracing (KT) models that only analyze response correctness may not fully capture nuances in student knowledge from student code. In this paper, we introduce Test case-Informed Knowledge Tracing for Open-ended Coding (TIKTOC), a framework to simultaneously analyze and predict both open-ended student code and whether the code passes each test case. We augment the existing CodeWorkout dataset with the test cases used for a subset of the open-ended coding questions, and propose a multi-task learning KT method to simultaneously analyze and predict 1) whether a student's code submission passes each test case and 2) the student's open-ended code, using a large language model as the backbone. We quantitatively show that these methods outperform existing KT methods for coding that only use the overall score a code submission receives. We also qualitatively demonstrate how test case information, combined with open-ended code, helps us gain fine-grained insights into student knowledge. |
Alexander Scarlatos, Ryan Baker and Andrew Lan | Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs | Recent advances in large language models (LLMs) have led to the development of artificial intelligence (AI)-powered tutoring chatbots, showing promise in providing broad access to high-quality personalized education. Existing works have primarily studied how to make LLMs follow tutoring principles but not how to model student behavior in dialogues. However, analyzing student dialogue turns can serve as a formative assessment, since open-ended student discourse may indicate their knowledge levels and reveal specific misconceptions. In this work, we present a first attempt at performing knowledge tracing (KT) in tutor-student dialogues. We propose LLM prompting methods to identify the knowledge components/skills involved in each dialogue turn and diagnose whether the student responds correctly to the tutor, and verify the LLM's effectiveness via an expert human evaluation. We then apply a range of KT methods on the resulting labeled data to track student knowledge levels over an entire dialogue. We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues. We perform extensive qualitative analyses to highlight the challenges in dialogue KT and outline multiple avenues for future work. |
Yixin Cheng, Rui Guan, Tongguang Li, Mladen Raković, Xinyu Li, Yizhou Fan, Flora Jin, Yi-Shan Tsai, Dragan Gašević and Zachari Swiecki | Self-regulated Learning Processes in Secondary Education: A Network Analysis of Trace-based Measures | While self-regulation is crucial for secondary school students, prior studies often rely on self-report surveys and think-aloud protocols that present notable limitations in capturing self-regulated learning (SRL) processes. This study advances the understanding of SRL in secondary education by using trace data to examine SRL processes during multi-source writing tasks, with higher education participants included for comparison. We collected fine-grained trace data from 66 secondary school students and 59 university students working on the same writing tasks within a shared SRL-oriented learning environment. We used Bannert's validated SRL coding scheme to label the data, reflecting specific SRL processes, and examining their relationship with essay performance and educational levels. Using epistemic network analysis (ENA) to model and visualise the interconnected SRL processes, we found that: (a) secondary school students predominantly engaged in Orientation, Re-reading, and Elaboration/Organisation; (b) high-performing secondary students engaged more in Re-reading, while low-performing students showed more Orientation; and (c) higher education students exhibited more diverse SRL processes like Monitoring and Evaluation than secondary students, who mainly followed task instructions and rubrics. These findings highlight the necessity of designing scaffolding tools and developing teacher training programs to enhance awareness and development of SRL skills for secondary school learners. |
Lina Zhong, Weijie Lang, Jia Rong, Guanliang Chen and Miao Fan | Enhancing Motivation and Learning in Primary School History Classrooms: The Impact of Virtual Reality | Traditional classroom teaching often fails to convey cultural heritage effectively due to the limitations of spatial and temporal scales, making it difficult for students to fully engage with and appreciate the historical content. Visual reality (VR) technology, in contrast, allows for human-centered, immersive presentations of cultural heritage, offering a digital experience that can be especially valuable when physical access is limited. The key question explored in this study is whether VR can improve students' performance in cultural lessons compared to traditional teaching methods. A total of 228 primary school students (Grades 5-6) were randomly assigned to either a high-visual experience group (VR, 360° video) or a low-visual experience groups (static video, textbook). The results indicated that students in the high-visual experience group exhibited greater intrinsic motivation and learning gains than those in the low-visual experience group. Additionally, the study found that negative user experiences significantly moderated the relationship between intrinsic motivation and learning gains. These findings provide empirical evidence supporting the integration of VR into traditional classroom teaching, demonstrating its potential to enhance student engagement and learning outcomes in history and cultural education. |
Saleh Ramadhan Alghamdi, Kaixun Yang, Yizhou Fan, Dragan Gasevic, Guanliang Chen and Mladen Rakovic | Analytics of Temporal Patterns of Self-regulated Learners: A Time Series Approach | Temporal patterns play a significant role in understanding dynamic changes in Self-regulated Learning (SRL) engagement over time. Several previous studies have proposed approaches for automated detection of SRL strategies through analysis of temporal patterns. However, these approaches are mostly focused on the analysis of patterns in sequential ordering of SRL processes. This offers a useful yet limited temporal perspective to SRL. As noted in the literature, temporality of SRL has two dimensions -- passage of time and ordering of events. To address this gap, this paper proposes a time series approach that can automatically detect SRL strategies by accounting for both dimensions of temporality. Our approach also explores when specific processes occur and how learners engage metacognitively or cognitively with learning tasks. In particular, this study investigated SRL engagement as students composed essays using multiple sources within a 120-minute time frame. The results indicated that five distinct strategies with varying levels of engagement were detected. The correlation between these identified strategies and students' scores was not statistically significant; however, further exploration revealed that students who adopted a specific strategy could outperform other groups based on obtained scores. We also noticed additional factors that had a positive effect on learners' performance. |
Changrong Xiao, Wenxing Ma, Qingping Song, Sean Xin Xu, Kunpeng Zhang, Yufang Wang and Qi Fu | Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs | Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable consistency, generalizability, and explainability. We propose an open-source LLM-based AES system, inspired by the dual-process theory. Our system offers accurate grading and high-quality feedback, at least comparable to that of fine-tuned proprietary LLMs, in addition to its ability to alleviate misgrading. Furthermore, we conduct human-AI co-grading experiments with both novice and expert graders. We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders, particularly for essays where the model has lower confidence. These results highlight the potential of LLMs to facilitate effective human-AI collaboration in the educational context, potentially transforming learning experiences through AI-generated feedback. |
Christothea Herodotou, Jessica Carr, Sagun Shrestha, Catherine Comfort, Vaclav Bayer, Claire Maguire, John Lee, Paul Mulholland and Miriam Fernandez | Prescriptive analytics motivating distance learning students to take remedial action: A case study of a student-facing dashboard | Student-facing learning analytics dashboards aim to help students to monitor their study progress, achieve learning goals and develop self-regulation skills. Only few of them present personalised data visualisations and aim to develop agentic students who take remedial action to improve their study habits, learning and performance. In this paper, a student-facing dashboard, designed following principles of participatory research, was tested with 30 undergraduate students, who engaged with it over a period of 4 to 15 weeks and while studying an online course. This is one of the few dashboards available that presents all different types of analytics to students: descriptive, predictive and prescriptive. A mixed methods approach was used to assess its usefulness and impact on motivation to study and take remedial action to support learning. Data analysis showcased that such a dashboard can be “a roadmap to success†by motivating students to study more and improve their performance, in addition to helping with monitoring, planning and reflection. Implications for future studies are discussed. |
Pakon Ko, Cong Liu, Nancy Law, Yuanru Tan and David Williamson Shaffer | Exploring students' epistemic orientation, learning trajectories, and outcomes | The influence of students' epistemic orientations on their learning behavior and outcomes is well-documented. However, limited research explores students' epistemic orientations in terms of conceptual engagement and learning outcomes. This study, set within the context of higher education, examined the patterns of conceptual engagement among two performance groups and identifies differences in their epistemic orientations. Both epistemic network analysis (ENA) and ordered network analysis (ONA) methods were used. The results from the ENA revealed distinct trajectories and patterns of conceptual engagement between high-performing and low-performing students during different periods in their learning journey. High-performing students were able to establish a more interconnected and distributed epistemic network earlier than their low-performing counterparts. ONA results revealed that (1) high-performing students were more inclined to employ abstract theoretical concepts to address empirical concerns, doing so more frequently and earlier; and (2) low-performing students benefitted from forum interactions with high-performing students to expand their knowledge resources and engagement with theoretical constructs over time. These discoveries contribute to our comprehension of epistemic orientations in different learners. The implications of this study could help generate learning analytics that monitor students' conceptual engagement in forum discussion and provide feedback to guide the design of learning. |
Gabrielle Martins Van Jaarsveld, Jacqueline Wong, Martine Baars, Marcus Specht and Fred Paas | Scaling goal-setting interventions in higher education using a conversational agent: Examining the effectiveness of guidance and adaptive feedback | Goal setting is the first and driving stage of the self-regulated learning cycle. Studies have shown that supporting goal setting is an effective means of improving academic performance among higher education students. However, doing so can be complex and resource intensive. In this study, a goal-setting conversational agent was designed and deployed to support higher education students in setting academic goals. Across 5-weeks, we tested the effects of goal-setting prompts (guided vs. unguided) and adaptive feedback (with vs. without) when delivered via a goal-setting conversational agent. We explored the effects of these supports (i.e., guidance and feedback) on students' 1) goal quality and 2) goal attainment. Findings showed that guidance and feedback combined had the largest positive effect on goal quality. They also revealed that guidance alone produced initially high-quality goals which decreased in quality overtime, whereas feedback had a delayed but cumulative effect on quality across multiple goal setting iterations. However, neither guidance nor feedback had significant effects on goal attainment, and there was no significant relationship between goal quality and attainment. This study provides insights into how a goal-setting conversational agent and adaptive feedback can be used to support the academic goal setting process for higher education students. |
Yige Song, Eduardo Oliveira, Paula de Barba, Michael Kirley and Pauline Thompson | Investigating Validity and Generalisability in Trace-Based Measurement of Self-Regulated Learning: A Multidisciplinary Study | Self-regulated learning (SRL) skills are crucial for effective learning and academic success. With the increasing availability of trace data on students' online activities, researchers have sought to directly measure SRL processes from this data. However, challenges remain regarding validity (accuracy of SRL inferences) and generalisability (applicability across contexts). This study investigates these challenges by focusing on the same group of students enrolled in two first-year university subjects from different disciplines. To investigate validity, we incorporated multiple data sources to compare and validate two trace-SRL frameworks: data-driven and theory-driven. For generalisability, we examined how these frameworks performed across the two subjects. Our analysis included 76 initial survey responses, over 300 daily SRL survey responses, and more than 6,000 sequences of learning actions recorded as trace data. The findings indicate that subject-specific factors influence learning behaviours, but student-specific factors account for most SRL variance. Additionally, discrepancies in detecting SRL phases (planning, engagement, reflection) across disciplines highlight the complexity of capturing SRL processes from trace data. This research offers important insights for developing more valid and generalisable trace-SRL frameworks to support students across diverse digital learning environments. |
Mohammed Saqr, Sonsoles López-Pernas, Tiina Törmänen, Rogers Kaliisa, Kamila Misiejuk and Santtu Tikka | Transition Network Analysis: A Novel Framework for Modeling, Visualizing, and Identifying the Temporal Patterns of Learners and Learning Processes | This paper proposes a novel analytical framework: Transition Network Analysis (TNA), an approach that integrates Stochastic Process Mining and probabilistic graph representation to model, visualize, and identify transition patterns in the learning process data. Combining the relational and temporal aspects into a single lens offers capabilities beyond either framework, including centralities to capture important learning events, community finding to identify patterns of behavior, and clustering to reveal temporal patterns. This paper introduces the theoretical and mathematical foundations of TNA. To demonstrate the functionalities of TNA, we present a case study with students (n=191) engaged in small-group collaboration to map patterns of group dynamics using the theories of co-regulation and socially-shared regulated learning. The analysis revealed that TNA could reveal the regulatory processes and identify important events, temporal patterns and clusters. Bootstrap validation established the significant transitions and eliminated spurious transitions. In doing so, we showcase TNA's utility to capture learning dynamics and provide a robust framework for investigating the temporal evolution of learning processes. Future directions include --inter alia-- advancing estimation methods, expanding reliability assessment, exploring longitudinal TNA, and comparing TNA networks using permutation tests. |
Christof Imhof, Martin Hlosta and Per Bergamin | Will they or won't they make it in time? The role of contextual and behavioral predictors in reaching deadlines of mandatory assignments | Procrastination and other forms of irrational delay are widespread among university students, leading to an array of potential negative consequences. While the reasons for this type of behavior are manifold and many facilitating factors have been identified, which of these factors are able to predict dilatory behavior in online/distance education has received comparatively little attention in the literature so far. In this study, we intended to compare the performance of two sets of objective predictors of delay, namely contextual variables based on characteristics of the assignment, and behavioral variables based on log data. Using historical data drawn from our university's learning management system, we calculated Bayesian multilevel models. The strongest and most consistent predictors turned out to be interval between the first click on the assignment and its deadline, the interval between the start of a block and the first click on the assignment, the number of clicks on the assignment, and the deadline type. The combination of both sets of predictors slightly improved the model's performance. |
Devika Venugopalan, Ziwen Yan, Conrad Borchers, Jionghao Lin and Vincent Aleven | Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support | Caregivers (i.e., parents and other members of a child's caring community) are underappreciated stakeholders in learning analytics. Although caregiver involvement can enhance student academic outcomes, many obstacles hinder involvement, most notably knowledge gaps in modern school curricula. An emerging topic of interest in learning analytics is hybrid tutoring, which includes supporting learning on instructional and motivational levels. Caregivers assert similar roles in homework, yet it is unknown how learning analytics can support them. We developed a system that provides instructional support to caregivers through conversational recommendations generated by a Large Language Model (LLM). Addressing known instructional limitations of LLMs, we leverage instructional intelligence from tutoring systems while conducting prompt engineering experiments for prompting the open-source Llama 3 LLM. This LLM generated message recommendations for caregivers supporting their child's math practice via chat. Few-shot prompting and combining real-time problem-solving context from tutoring systems with examples of tutoring practices yielded desirable message recommendations. These recommendations were evaluated with ten middle school caregivers, who were found to value recommendations facilitating content-level support and student metacognition (e.g., self-explanation). We contribute insights into how tutoring systems can be best merged with LLMs, and how LLM-generated conversational support can facilitate effective caregiver involvement in tutoring systems. |
Hannah Deininger, Cora Parrisius, Rosa Lavelle-Hill, Detmar Meurers, Ulrich Trautwein, Benjamin Nagengast and Gjergji Kasneci | Who Did What to Succeed? Individual Differences in Which Learning Behaviors Are Linked to Achievement | It is commonly assumed that digital learning environments such as intelligent tutoring systems facilitate learning and positively impact achievement. This study explores how different groups of students exhibit distinct relationships between learning behaviors and academic achievement in an intelligent tutoring system for English as a foreign language. We examined whether these differences are linked to students' prior knowledge, personality traits, and motivation. We collected behavioral trace data from 507 German seventh-grade students during the 2021/22 school year and applied machine learning models to predict English performance based on learning behaviors (best-performing model's $R^2 = .41$). To understand the impact of specific behaviors, we applied the explainable AI method SHAP and identified three student clusters with distinct learning behavior patterns. Subsequent analyses revealed that these clusters also varied in prior knowledge and motivation: one with high prior knowledge and average motivation, another with low prior knowledge and average motivation, and a third with both low prior knowledge and low motivation. Our findings suggest that learning behaviors are linked differently to academic success across students and are closely tied to their prior knowledge and motivation. This hints towards the importance of personalizing learning systems to support individual learning needs better. |
Mónica Hernandez-Campos, Isabel Hilliger-Carrasco and Francisco-José García-Peñalvo | Evaluating Learning Outcomes Through Curriculum Analytics: Actionable Insights for Curriculum Decision-making | Learning analytics (LA) emerged with the promise of improving student learning outcomes (LOs), however, its effectiveness in informing actionable insights remains a challenge. Curriculum analytics (CA), a subfield of LA, seeks to address this by using data to inform curriculum development. This study explores using CA to evaluate LOs through direct standardized measures at the subject level, examining how this process informs curriculum decision-making. Conducted at an engineering-focused higher education institution, the research involved 32 administrators and 153 faculty members, serving 9.906 students across nine programs. By utilizing the Integrative Learning Design Framework, we conducted three phases of this framework and present key results. Findings confirm the importance of stakeholder involvement throughout different design phases, highlighting the need for ongoing training and support. Among the actionable insights that emerged from LOs assessments, we identified faculty reflections regarding the need to incorporate active learning strategies, improve course planning, and acknowledge the need for education-specific training for faculty development. Although the study does not demonstrate whether these insights lead to improvements in LOs, this paper contributes to the CA field by offering a practical approach to evaluating LOs and translating these assessments into actionable improvements within an actual-world educational context |
Pauline Aguinalde and Jinnie Shin | Talking in Sync: How Linguistic Synchrony Shapes Teacher-Student Conversation in English as a Second Language Tutoring Environment | Linguistic synchrony, or alignment, has been shown to be critical for student learning, particularly for L2 students (second language learners), whose patterns of synchrony often differ from fluent speakers due to proficiency constraints. While many studies have explored various dimensions of synchrony in global language tutoring contexts, there is a gap in understanding how linguistic synchrony evolves dynamically over the course of a tutoring session and how tutors' pedagogical strategies influence this process. This study incorporates three dimensions of synchrony--lexical, syntactic, and semantic--along with tutors' dialogue acts to evaluate their association with student performance using multivariate time-series analysis. Results indicate that lower-performing L2 students tend to lexically align with their tutor more consistently in the long term and with higher intensity in the short term. In contrast, higher-performing students demonstrate greater alignment with the tutor in syntactic and semantic dimensions. Furthermore, the dialogue acts of eliciting, scaffolding, and enquiry were found to play the strongest roles in influencing synchrony and impacting learning outcomes. |
Josmario Albuquerque, Bart Rienties and Blaženka Divjak | Decoding Learning Design decisions: A Cluster Analysis of 12,749 Teaching and Learning Activities | Substantial progress has been made in how educators can be supported to implement effective learning design (LD) with learning analytics (LA). However, how educators make micro-decisions about designing individual teaching and learning activities (TLAs) and how these are related to wider pedagogical approaches has received limited empirical support. This study explored how 165 educators designed and integrated 12,749 TLA in 218 LDs using clustering, pattern-mining, and correlational analysis. The findings suggest most educators use a combination of four common LD TLAs (i.e., Collaboration, Generating independent learning, Assessment, and Traditional classroom activities). The four common TLAs could be used to develop LA and Generative Artificial Intelligence (Gen-AI) approaches to support educators in making more informed and evidence-based design decisions for effective learning and teaching. |
Haejin Lee, Clara Belitz, Nidhi Nasiar and Nigel Bosch | XAI Reveals the Causes of Attention Deficit Hyperactivity Disorder (ADHD) Bias in Student Performance Prediction | Uncovering algorithmic bias related to sensitive attributes is crucial. However, understanding the underlying causes of bias is even more important to ensure fairer outcomes. This study investigates bias associated with Attention Deficit Hyperactivity Disorder (ADHD) in a machine learning model predicting students' test scores. While fairness metrics did not reveal significant bias, potential subtle bias indicated by variations in model performance for students with ADHD was observed. To uncover the causes of this potential bias, we correlated SHapley Additive exPlanations (SHAP) values with the model's prediction errors, identifying the features most strongly associated with increasing prediction errors. Behavioral and self-reported survey features designed to measure students' use of effective learning strategies were identified as potential causes of the model underestimating test grades for ADHD students. Behavioral features had a stronger correlation between absolute SHAP values and prediction errors (up to r = .354, p = .013) for students with ADHD than for those without ADHD. Students with ADHD often use unique yet effective approaches to studying in online learning environments--approaches that may not be fully captured by traditional measures of typical student behaviors. These insights suggest adjusting feature design to better account for students with ADHD and mitigate bias. |
Halim Acosta, Daeun Hong, Seung Lee, Wookhee Min, Bradford Mott, Cindy Hmelo-Silver and James Lester | Collaborative Game-based Learning Analytics: Predicting Learning Outcomes from Game-based Collaborative Problem Solving Behaviors | Collaborative problem-solving (CPS) skills are essential for the 21st century, enabling students to solve complex problems effectively. As the demand for these skills rises, understanding their development and manifestation becomes increasingly essential. To address this need, we present a data-driven framework that identifies behavioral patterns associated with CPS practices and can assess students' learning outcomes. It provides explainable insights into the relationship between their behaviors and learning performance. We employ embedding and clustering techniques to categorize similar trace logs and apply Latent Dirichlet Allocation to generate meaningful descriptors. Constraint-based pattern mining algorithms extract significant behavioral motifs for predictive modeling. To capture the temporal evolution of student behaviors, we introduce a graph-based representation of transitions between behavior patterns. We map behavioral patterns to a CPS ontology by analyzing how action sequences correspond to specific CPS practices. Analysis of semi-structured trace log data from 61 middle school students engaged in collaborative game-based learning reveals that the extracted behavioral patterns significantly predict student learning gains using generalized additive models. Our analysis identifies patterns that provide insights into the relationship between student use of CPS practices and learning outcomes. |
Mohammad Khalil, Farhad Vadiee, Ronas Shakya and Qinyi Liu | Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation | In this study, we explore the growing potential of AI and deep learning technologies, particularly Generative Adversarial Networks (GANs) and Large Language Models (LLMs), for generating synthetic tabular data. Access to quality students' data is critical for advancing learning analytics, but privacy concerns and stricter data protection regulations worldwide limit their availability and usage. Synthetic data offers a promising alternative. We investigate whether synthetic data can be leveraged to create artificial students for serving learning analytics models. Using the popular GAN model- CTGAN and three LLMs- GPT2, DistilGPT2, and DialoGPT, we generate synthetic tabular student data. Our results demonstrate the strong potential of these methods to produce high-quality synthetic datasets that resemble real students' data. To validate our findings, we apply a comprehensive set of utility evaluation metrics to assess the statistical and predictive performance of the synthetic data and compare the different generator models used, specially the performance of LLMs. Our study aims to provide the learning analytics community with valuable insights into the use of synthetic data, laying the groundwork for expanding the field's methodological toolbox with new innovative approaches for learning analytics data generation. |
Jae-Eun Russell, Anna-Marie Smith, Salim George, Jonah Pratt, Brian Fodale, Cassandra Monk and Adam Brummett | Unlocking Insights: Investigating Student AI Tutor Interactions in a Large Introductory STEM Course | This study explores the use of an AI tutor and its relationship to performance outcomes in a large introductory STEM course, where the AI tutor was integrated into the homework system interface. The course included 13 weekly homework assignments, comprising 221 questions that contributed 19.5\% to the final grade. Findings showed that students predominantly completed homework problems independently, using the AI tutor selectively for specific challenges. Patterns of AI interaction varied across both problem and student levels, while demographic factors showing little influence on AI usage behavior. Notably, the frequency of AI use was not linked to exam performance. A cluster analysis revealed diverse patterns of students' behavior in relation to AI tutor use during the problem-solving process. The paper discusses these varying interactions in detail, along with its limitations. |
Kathrin Seßler, Maurice Fürstenberg, Babette Bühler and Enkelejda Kasneci | Can AI grade your essays? A comparative analysis of large language models and teacher ratings in multidimensional essay scoring | The manual assessment and grading of student writing is a time-consuming yet critical task for teachers. Recent developments in generative AI, such as large language models, offer potential solutions to facilitate essay-scoring tasks for teachers. In our study, we evaluate the performance and reliability of both open-source and closed-source LLMs in assessing German student essays, comparing their evaluations to those of 37 teachers across 10 pre-defined criteria (i.e., plot logic, expression). A corpus of 20 real-world essays from Year 7 and 8 students was analyzed using five LLMs: GPT-3.5, GPT-4, o1, LLaMA 3-70B, and Mixtral 8x7B, aiming to provide in-depth insights into LLMs' scoring capabilities. Closed-source GPT models outperform open-source models in both internal consistency and alignment with human ratings, particularly excelling in language-related criteria. The novel o1 model outperforms all other LLMs, achieving Spearman's r = .74 with human assessments in the overall score, and an internal consistency of ICC=.80. These findings indicate that LLM-based assessment can be a useful tool to reduce teacher workload by supporting the evaluation of essays, especially with regard to language-related criteria. However, due to their tendency for higher scores, the models require further refinement to better capture aspects of content quality. |
Yu Fang, Shihong Huang and Amy Ogan | A Cross-Cultural Confusion Model for Detecting and Evaluating Students' Confusion In a Large Classroom | In traditional lecture delivery setting, it is very challenging to identify which part of the lecture material that students are struggling with. One approach to identify difficult concepts is to capture students confusion during class time. However, most existing confusion detectors focus on an individual student rather than a classroom, and only on a single ethnicity group which could propagate bias when developing pedagogical technologies. In this paper, we leverage two existing ‘Confused' facial expression datasets (DAiSEE and DevEmo) with an East Asian ‘Confused' facial expression dataset that we collected. Through model performance and explainableAI, we addressed the potential of cultural biases on detecting emotions, particularly on confusion, and identified culturally-specific features that align with prior research. As a proof-of concept, we deployed this cross-cultural confusion machine learning model in a live semester-long class. Overall, our work to integrate cross-cultural facial features exemplifies the importance of fostering inclusivity in educational technologies. |
Mohammad Hassany, Peter Brusilovsky, Jaromir Savelka, Arun Balajiee Lekshmi Narayanan, Kamil Akhuseyinoglu, Arav Agarwal and Rully Agus Hendrawan | Generating Effective Distractors for Introductory Programming Challenges: LLMs vs Humans | As large language models (LLMs) show great promise in generating wide spectrum of educational materials, robust yet cost-effective assessment of the quality and effectiveness of such materials becomes an important challenge. Traditional approaches, including expert-based quality assessment and student-centered evaluation, are resource-consuming, and do not scale efficiently. In this work, we explored the use of pre-existing student learning data as a promising approach to evaluate LLM-generated learning materials. The dataset included responses from 1,071 students across 22 classes taught from Fall 2017 to Spring 2023. We evaluated five prominent LLMs (OpenAI-o1, GPT-4, GPT-4o, GPT-4o-mini, and Llama-3.1-8b) across three different prompts to see which combinations result in more effective distractors, i.e., those that are plausible (often picked by students), and potentially based on common misconceptions. Our results suggest that GPT-4o was the most effective model, matching close to 50\% of the functional distractors originally generated by humans. At the same time, all of the evaluated LLMs generated many novel distractors, i.e., those that did not match the pre-existing human-crafted ones. Our preliminary analysis shows that those appear to be promising. Establishing their effectiveness in real-world classroom settings is left for future work. |
Danielle R Thomas, Conrad Borchers, Sanjit Kakarla, Jionghao Lin, Shambhavi Bhushan, Boyuan Guo, Erin Gatz and Kenneth R Koedinger | Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT | The role of multiple-choice questions (MCQs) as effective learning tools has been debated within learning analytics. While MCQs are widely used due to their ease of grading, the field is increasingly moving toward using open-response questions for instruction, given advancements in large language models (LLMs) for automated grading. This study evaluates MCQs effectiveness relative to open-response questions, both individually and in combination, on learning. These activities are embedded within six tutor lessons on advocacy. Using a posttest-only randomized control design, we compare the performance of 234 tutors (790 lesson completions) across three conditions: MCQ only, open response only, and a combination of both. We find no significant learning differences across conditions at posttest, but tutors in the MCQ condition took significantly less time to complete instruction. These findings suggest that MCQ are as effective, or potentially even more effective, than open response tasks for learning when practice time is limited. To further enhance efficiency, we autograded open responses using GPT-4o and GPT-4-turbo. GPT models demonstrate proficiency for purposes of low-stakes assessment, though further research is needed for broader use. This study contributes a dataset of lesson log data, human annotation rubrics, and LLM prompts to promote transparency and reproducibility. |
Danielle R Thomas, Conrad Borchers, Sanjit Kakarla, Jionghao Lin, Shambhavi Bhushan, Boyuan Guo, Erin Gatz and Kenneth R Koedinger | Do Tutors Learn from Equity Training and Can Generative AI Assess It? | Equity is a core concern of learning analytics. However, applications that teach and assess equity skills, particularly at scale are lacking--often due to barriers in evaluating language. Advances in generative AI via large language models (LLMs) are being used in a wide range of applications with this present work assessing its use in the equity domain. We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations. We apply a mixed-method approach for analyzing the performance of 81 undergraduate remote tutors. We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge in responding to middle school students experiencing possible inequities from pretest to posttest. Both GPT-4o and GPT-4-turbo demonstrate proficiency in assessing tutors ability to predict and explain the best approach. Balancing performance, efficiency, and cost, we determine few-shot learning using GPT-4o to be the preferred model. This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts. Future work involves improving the alignment in difficulty among scenarios and enhancing LLM prompts for enhancing large-scale grading and assessment. |
Linxuan Zhao, Mladen Raković, Elizabeth B. Cloude, Xinyu Li, Dragan Gašević and Lisa Bardach | The Effect of Sequential Transition of Self-Regulated Learning Processes on Performance: Insights from Ordered Network Analysis | Productively engaging in SRL is challenging for learners since it involves coordinating multiple motivational, affective, cognitive, and metacognitive processes. Researchers have investigated methods to adaptively scaffold learners' productive engagement using SRL processes automatically captured by SRL detectors. However, most previous studies relied solely on the frequency of SRL processes to drive adaptive scaffolds (e.g., feedback, hints), possibly missing the sequential characteristics inherent to self-regulation, a crucial dimension of productive SRL. To address this gap, this study analysed the impact of sequential transitions between multiple SRL processes on learners' performance on a reading-writing task with a hypermedia environment called Flora. A sample of 66 secondary- school learners completed the task and trace data were collected. Grounded in the COPES model of SRL, a rule-based SRL detector was employed to capture SRL processes from collected trace data. We employed a method combining logistic regression with ordered network analysis (ONA) to analyse the transitions between the detected SRL processes. This exploratory study revealed several influential transitions to learners' performance in different temporal learning blocks of self-regulation. The implications suggest the potential of using COPES SRL process transitions to drive adaptive scaffolds to facilitate engagement in productive SRL, benefiting performance outcomes in hypermedia environments. |
Kimberly Williamson, Rene Kizilcec, Sean Fath and Neil Heffernan | Algorithm Appreciation in Education: Educators Prefer Complex over Simple Algorithms | Algorithm aversion among educators can pose a challenge to the adoption of AI tools in education, especially when complex algorithms are involved. This study investigates how providing explanations for a complex algorithm in an intelligent tutoring system (ITS) affects educators' attitudes, trust, and willingness to adopt the tool. In two randomized experiments (n=570), we compare educator preferences between a simple heuristic algorithm and a complex (Bayesian Knowledge Tracing) algorithm, focusing on how explanations for the complex algorithm can improve attitudes and adoption. Surprisingly, we found that educators generally preferred the complex over the simple algorithm, and explanations did not improve attitudes or adoption intentions, even when educators had to explain the complex algorithm's predictions. The complex algorithm scored lower on informational fairness than the simple one, considering it is less transparent, and the explanation was insufficient to overcome this. Overall, the findings suggest that widespread algorithm aversion may have evolved into algorithm appreciation, at least in the context of widely used technologies like ITS. |
Yeyu Wang | Qualitative Parameter Triangulation: A Conceptual and Methodological Framework for Event-Based Temporal Models | Learning is a complex process that occurs over time. To represent this complex process, interest has been rising in conceptualizing and integrating temporality into model constructions. However, the construction of an event-based temporal model is challenging. Specifically, researchers struggle with translating qualitative heuristics and theoretical hypotheses into quantifiable temporal parameters. Existing methods of parameter derivation also suffer from issues of model transparency and oversimplification of learning contexts. Thus, we propose a conceptual and methodological framework, Qualitative Parameter Triangulation (QPT), to center human interpretation as the first step of modeling. Based on human interpretations, QPT constructs a qualitative loss function and optimizes temporal parameters automatically. The final step is to check hermeneutics between a global representation with deictic qualitative evidence given specific learning moments. By presenting a worked example of QPT, we demonstrated the process of maintaining pairwise alignments across interpretation, systematization, and approxi-gation. As a proof of concept, QPT is a plausible framework for determining temporal parameters and constructing event-based temporal models. |
Jaeyoon Choi, Shamya Karumbaiah, Jeffrey Matayoshi and Daniel Bolt | Bias or Insufficient Sample Size? Improving Reliable Estimation of Algorithmic Bias for Minority Groups | Despite the prevalent use of predictive models in learning analytics, several studies have demonstrated that these models can show disparate performance across different demographic groups of students. The first step to audit for and mitigate bias is to accurately estimate it. However, the current practice of identifying and measuring group bias faces reliability issues. In this paper, we use simulations and real-world data analysis to explore statistical factors that impact the reliability of bias estimation and suggest approaches to account for it. Our analysis revealed that small group sizes lead to high variability in group bias estimation due to sampling error -- an issue that is more likely to impact students from historically marginalized communities. We then suggest statistical approaches, such as bootstrapping, to construct confidence intervals for a more reliable estimation of group bias. Based on our findings, we encourage future learning analytics research to ensure sufficiently large group sizes, construct confidence intervals, use at least two metrics, and move beyond the dichotomy of the presence or absence of bias for a more comprehensive evaluation of group bias. |
Alrike Claassen, Negin Mirriahi, Vitomir Kovanovic and Shane Dawson | From Data to Design: Integrating Learning Analytics into Educational Design for Effective Decision-Making | Learning Analytics (LA) aims to provide university instructors with meaningful data and insights that can be used to improve courses. However, instructors are often met with challenges that arise when wanting to use LA to inform their educational design decisions. For instance, there may be a misalignment between instructors' needs and the data and insights LA systems provide. Further research is required to understand instructors' expectations of LA and how it can support the diversity of educational designs. This case study addresses this gap by investigating the role of LA in instructors' educational decision-making processes. The study employs self-determination theory's constructs to examine instructors' existing practices when using LA to support their decision-making. The study reveals that LA enables instructors to make data-informed iterative educational design decisions, supporting their need for competence and relatedness. The emotional aspect of LA is an important consideration that can easily lead to demotivation and avoidance of LA. Support is needed to address instructors' psychological needs so instructors can fully utilise LA to make effective educational design decisions. The findings inform a framework for considering how instructors' data-informed educational decision-making can be understood. The implications of our findings and opportunities for the future are discussed. |
Kaixun Yang, Mladen Raković, Zhiping Liang, Lixiang Yan, Zijie Zeng, Yizhou Fan, Dragan Gašević and Guanliang Chen | Modifying AI, Enhancing Essays: How Active Engagement with Generative AI Boosts Writing Quality | Students are increasingly relying on Generative AI (GAI) to support their writing--a key pedagogical practice in education. In GAI-assisted writing, students can delegate core cognitive tasks to GAI while still producing high-quality essays. This creates new challenges for teachers in assessing and supporting student learning, as they often lack insight into whether students are engaging in meaningful cognitive processes during writing or how much of the essay's quality can be attributed to those processes. This study aimed to help teachers better assess and support student learning in GAI-assisted writing by examining how different writing behaviors, especially those indicative of meaningful learning versus those that are not, impact essay quality. Using a dataset of 1,445 GAI-assisted writing sessions, we applied X-Learner method to quantify the causal impact of three GAI-assisted writing behavioral patterns on four measures of essay quality. Our analysis showed that writers who frequently modified GAI-generated text--suggesting active engagement in higher-order cognitive processes--consistently improved the written quality. In contrast, those who often accepted GAI-generated text without changes, primarily engaging in lower-order processes, saw a decrease in essay quality. Additionally, while human writers tend to introduce linguistic bias when writing independently, incorporating GAI-generated text--even without modification--can help mitigate this bias. |
Yueqiao Jin, Kaixun Yang, Lixiang Yan, Vanessa Echeverria, Linxuan Zhao, Riordan Alfredo, Mikaela Milesi, Jie Fan, Xinyu Li, Dragan Gasevic and Roberto Martinez-Maldonado | Chatting with a Learning Analytics Dashboard: The Role of Generative AI Literacy on Learner Interaction with Conventional and Scaffolding Chatbots | Learning analytics dashboards (LADs) simplify complex learner data into accessible visualisations, providing actionable insights for educators and students. However, their educational effectiveness has not always matched the sophistication of the technology behind them. Explanatory and interactive LADs, enhanced by generative AI (GenAI) chatbots, hold promise by enabling dynamic, dialogue-based interactions with data visualisations and offering personalised feedback through text. Yet, the effectiveness of these tools may be limited by learners' varying levels of GenAI literacy, a factor that remains underexplored in current research. This study investigates the role of GenAI literacy in learner interactions with conventional (reactive) versus scaffolding (proactive) chatbot-assisted LADs.. Through a comparative analysis of 81 participants, we examine how GenAI literacy is associated with learners' ability to interpret complex visualisations and their cognitive processes during interactions with chatbot-assisted LADs. Results show that while both chatbots significantly improved learner comprehension, those with higher GenAI literacy benefited the most, particularly with conventional chatbots, demonstrating diverse prompting strategies. Findings highlight the importance of considering learners' GenAI literacy when integrating GenAI chatbots in LADs and educational technologies. Incorporating scaffolding techniques within GenAI chatbots can be an effective strategy, offering a more guided experience that reduces reliance on learners' GenAI literacy. |
Qinyi Liu, Oscar Deho, Farhad Vadiee, Mohammad Khalil, Srecko Joksimovic and George Siemens | Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms | The increasing use of machine learning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models. |
Xinyun He, Qi Shu, Mo Zhang, Wei Huang, Han Zhao and Mengxiao Zhu | Beyond Final Products: Multi-Dimensional Essay Scoring Using Keystroke Logs and Deep Learning | Essay assessment plays a crucial role in evaluating students' abilities in logical reasoning, critical thinking, and creativity. However, traditional manual scoring methods often suffer from inefficiencies due to fatigue, bias, and emotional factors, compromising objectivity. Automated Essay Scoring (AES) systems offer a more efficient and impartial alternative, yet most existing systems focus primarily on evaluating the final written product, overlooking the valuable data captured during the writing process. To address this issue, we introduce an innovative model called KAES, which explores the potential of integrating writing process data to enhance AES performance. By leveraging multiple data sources, including text content, prompts, keystroke dynamics, and manually extracted features, the KAES model extracts meaningful insights and employs a multi-task learning approach to assess essays across both language and argumentative dimensions. Extensive experiments on the real-world CBAL dataset demonstrate that the KAES model significantly outperforms traditional baseline models, highlighting the effectiveness of incorporating writing process data into AES tasks. |
Soyeon Mun and Il-Hyun Jo | Evolving the 4C/ID model through learning analytics approaches: Teaching and learning system design framework for supporting learners' complex problem-solving | In recent years, there has been a surge in studies focusing on learning analytics (LA) aimed to collecting and analyzing learning trace data from various digital learning platforms. However, there is an urgent need for these platforms to be designed from the ground up by LA experts, ensuring that data collection and analysis procedures align with the objectives and activities of teaching and learning as informed by established instructional theories. As a result, we developed a novel framework that integrates the 4C/ID model with learning analytics approaches to enhance learners' complex problem-solving abilities within online and blended learning environments. In developing this framework, we focused on the four core components of the original 4C/ID model along with the P-A-S cycle to determine optimal timing and methods for data collection and analysis. Our aim is to propose a framework that not only revisits and reinterprets the 4C/ID model but also fosters the development of a learning analytics system embedded within digital learning platforms and closely tied to educational goals and learning activities. The findings of this study can serve as a valuable resource for designing and constructing adaptive teaching and learning systems, ultimately supporting learners in effectively cultivating their problem-solving skills. |
Naiming Liu, Shashank Sonkar, Debshila Basu Mallick, Richard Baraniuk and Zhongzhou Chen | Atomic Learning Objectives and LLMs Labeling: A High-Resolution Approach for Physics Education | This paper introduces a novel approach to create a high-resolution "map" for physics learning: an "atomic" learning objectives (LOs) system designed to capture detailed cognitive processes and concepts required for problem-solving in a college-level introductory physics course. Our method leverages Large Language Models (LLMs) for automated labeling of physics questions and adopts comprehensive metrics to evaluate the quality of LLMs labeling outcomes. The atomic LO system, covering nine chapters of an introductory physics course, uses a "subject-verb-object" structure to represent specific cognitive processes. We apply this system to 131 questions from faculty-created question banks and OpenStax University Physics textbook. Each question is labeled with 1-8 atomic LOs across three chapters. Through extensive experiments, we compare automated LLMs labeling results against human expert labeling. Our analysis reveals both the strengths and limitations of LLMs, providing insights into LLMs reasoning processes and identifying areas for improvement in both LLMs capabilities and LO system design. Our work contributes to the field of learning analytics by proposing a more granular approach to mapping learning objectives with questions. Our findings have significant implications for the development of personalized learning pathways in STEM education, paving the way for more effective "learning GPS" systems. |
Ge Gao, Amelia Leon, Andrea Jetten, Jasmine Turner, Husni Almoubayyed, Stephen Fancsali and Emma Brunskill | Short Horizon for Predicting Long-Term Outcomes using Log Data: A Data-Driven Study across Educational Contexts | Educational stakeholders are often particularly interested in sparse, delayed student outcomes, like end-of-year statewide exams. The rare occurrence of such assessments makes it harder to identify students likely to fail such assessments, as well as making it slow for researchers and educators to be able to assess the effectiveness of particular educational tools. Prior work has primarily focused on using logs from students full usage (e.g. year-long) of an educational product to predict outcomes, or considered predictive accuracy using a few minutes to predict outcomes after a short (e.g. 1 hour) session. In contrast, we investigate machine learning predictors using students' logs during their first few hours of usage can provide useful predictive insight into those students' end-of-school year external assessment. We do this on three diverse datasets: from students in Uganda using a literacy game product, and from students in the US using two mathematics intelligent tutoring systems. We consider various measures of the accuracy of the resulting predictors, including its ability to identify students at different parts along the assessment performance distribution. Our findings suggest that short-term log usage data, from 2-5 hours, can be used to provide valuable signal about students' long term external performance. |
Steve Woollaston, Brendan Flanagan, Patrick Ocheja, Yuko Toyokawa and Hiroaki Ogata | ARCHIE: Exploring Language Learner Behaviors in LLM Chatbot-Supported Active Reading Log Data with Epistemic Network Analysis | With the increasing integration of technology in education, chatbots and e-readers have emerged as promising tools for enhancing language learning experiences. This study investigates how students engage with digital texts and a purpose-built chatbot designed to promote active reading for EFL students. We analysed student interactions and compared high-proficiency and low-proficiency English learners. Results indicate that while all students perceived the chatbot as easy to use, useful, and enjoyable, significant behavioural differences emerged between proficiency groups. High-proficiency students exhibited more frequent interactions with the chatbot, engaged in more active reading strategies like backtracking, and demonstrated less help seeking behaviours. Epistemic Network Analysis revealed distinct co-occurrence patterns, highlighting the stronger connection between navigation and review behaviours in the high-proficiency group. These findings underscore the potential of chatbot-assisted language learning and emphasise the importance of incorporating active reading strategies for improved comprehension. |
Abisha Thapa Magar, Anup Shakya, Steve Fancsali, Vasile Rus, April Murphy, Steve Ritter and Deepak Venugopal | "Can A Language Model Represent Math Strategies?": Learning Math Strategies from Big Data using BERT | AI models have shown a remarkable ability to perform representation learning using large-scale data. In particular, the emergence of Large Language Models (LLMs) attests to the capability of AI models to learn complex hidden structures in a bottom-up manner without requiring a lot of human expertise. In this paper, we leverage these models to learn Math learning strategies at scale. Specifically, we use student interaction data from the MATHia Intelligent Tutoring System to learn strategies based on sequences of actions performed by students. To do this, we develop an AI model based on BERT (Bidirectional Encoder Representations From Transformers) that has two main components. First, we pre-train BERT using an approach known as Masked Language Modeling to learn embeddings for strategies. The embeddings represent strategies in a vector form while preserving their semantics. Next, we fine-tune the model to predict if students are likely to apply a correct strategy to solve a novel problem. We demonstrate using a large dataset collected from 655 schools that our approach where we pre-train to learn strategies from a sample of schools can be fine-tuned with a small number of examples to make accurate predictions over student data collected from other schools. |
Tongguang Li, Debarshi Nath, Yixin Cheng, Yizhou Fan, Xinyu Li, Mladen Raković, Hassan Khosravi, Zachari Swiecki, Yi-Shan Tsai and Dragan Gašević | Turning real-time analytics into adaptive scaffolds for self-regulated learning using generative artificial intelligence | In computer-based learning environments (CBLEs), adopting effective self-regulated learning (SRL) strategy requires sophisticated coordination of multiple SRL processes. While various studies have proposed adaptive SRL scaffolds (i.e., real-time advice in adopting effective SRL processes), two key research gaps remain. First, there is a lack of research on SRL scaffolds that are based on continuous assessment of both learners' SRL processes and learning conditions (e.g., awareness of learning resources) to provide adaptive support. Second, current analytics-based scaffolding mechanism lack the scalability needed to effectively address multiple learning conditions. Integration of analytics of SRL with generative artificial intelligence (GenAI) can provide scalable scaffolding for real-time SRL processes and evolving conditions. Yet empirical studies implementing and evaluating effects of this integration remain scarce. To address these limitations, we conducted a randomised control trial, assigning participants to three groups (control, process only, and process with condition groups) to investigate the effects of using GenAI to turn insights from real-time analytics about students' SRL processes and conditions into adaptive scaffolds. The results demonstrate that integrating real-time analytics with GenAI in adaptive SRL scaffolds -- addressing both SRL processes and dynamic conditions -- promotes more metacognitive learning patterns compared to the control and process-only groups. |
Chen Sun, Valerie Shute, Angela Stewart and Sidney D'Mello | The Relationship between Collaborative Problem-Solving Skills and Group-to Individual Learning Transfer in a Game-based Learning Environment | Collaborative problem solving (CPS) is increasingly viewed as an essential 21st century skill for the modern workforce. Accordingly, researchers have been investigating how to conceptualize, assess, and develop pedagogical approaches to improve CPS, which require theoretically-grounded and empirically-validated frameworks of CPS. The present paper focuses on validating the generalized competency model (GCM) of CPS, specifically with respect to predicting learning outcomes in a game-based learning environment. The GCM consists of three main facets - constructing shared knowledge, negotiating/coordinating, and maintaining team function - mapped to behavioral indicators (i.e., observable evidence). We investigated the extent to which the three facets predicted group-to-individual learning transfer among 249 students who comprised 83 triads and engaged in collaborative gameplay (Physics Playground) remotely via videoconferencing. We found that the only CPS facet predicting individual physics learning was maintaining team function, after accounting for pretest scores, students' perceptions on team collaboration, and their perceived physics self-efficacy. This facet was also the only significant predictor of individual learning regardless of how the facet scores were computed (i.e., reverse coding of negative indicators, separating the sums of positive and negative indicators, and no reverse coding of negative indicators). Implications on the GCM and other CPS frameworks are discussed. |
Joe Tang, Andrew Gibson and Peter Bruza | Analysis of exploratory behaviour: A step towards modelling of curiosity | In this research, we analysed exploratory behaviour trace data for students engaging in learning tasks in a technology-enhanced data analytics course as the first step towards modelling curiosity in learning. Curiosity is a complex phenomenon that is not amenable to direct modelling, but it can be understood through related behaviours like exploration, which is critical to effective learning. We analysed trace data from 40 students using visualisation and network analysis techniques, focusing on their interactions with learning tasks within the JupyterLab environment. Our analysis found that providing sufficient exploration time before explicit instruction or answer revelation, and designing learning tasks that embrace errors as opportunities, encouraged behaviours associated with curiosity-driven learning. These findings highlight the importance of designing learning environments that foster curiosity and promote active exploration. |
Rebecca Ferguson, Yuveena Gopalan and Simon Buckingham Shum | What's the Value of a Doctoral Consortium? Analysing a Decade of LAK DCs as a Community of Practice | Since 2013, the Learning Analytics and Knowledge (LAK) conference has included a Doctoral Consortium (DC). We frame the DC as a structured entry into the LAK community of practice (CoP). CoPs generate five types of value for their members: immediate, potential, applied, realised and reframing. This study used a survey of the 92 DC students from the first decade, supplemented with scientometric analysis of LAK publications, to address the questions: 'What value do students gain from attending the LAK doctoral consortium?' and 'Do students gain the same value from face-to-face and virtual doctoral consortia?' Reflexive thematic analysis of responses showed that students gained a wide range of immediate and potential value from the DC, which in many cases also prompted changes in practice, performance improvement or redefinition of success. However, the value reported by respondents who had attended virtually was more limited. This paper's contributions are (i) the first systematic documentation of student perceptions of LAK DCs, (ii) identification of ways in which doctoral consortia can be developed in the future, and (iii) specific attention to how virtual DCs can offer greater value for both participants and the host community of practice. |
Sriram Ramanathan, Lisa Lim and Simon Buckingham Shum | When the Prompt becomes the Codebook: Grounded Prompt Engineering (GROPROE) and its application to Belonging Analytics | With the emergence of generative AI, the field of Learning Analytics (LA) has increasingly embraced the use of Large Language Models (LLMs) to automate qualitative analysis. Deductive analysis requires theoretical bases to inform coding. However, few studies detail the process of translating the literature into a codebook and then into an effective LLM prompt. In this paper, we introduce Grounded Prompt Engineering (GROPROE) as a systematic process to develop a theory-grounded prompt for deductive analysis. We demonstrate our GROPROE process on a dataset of 860 students' written reflections, coding for affective engagement and sense of belonging. To evaluate the quality of the coding we demonstrate substantial human/LLM Inter-Annotator Reliability. A subset of the data was inputted 60 times to measure the consistency with which each code was applied using the LLM Quotient. We discuss the dynamics of human-AI interaction when following GROPROE, foregrounding how the prompt took over as the iteratively revised codebook, and how the LLM provoked codebook revision. The contributions to the LA field are threefold: (i) GROPROE as a systematic prompt-design process for deductive coding, (ii) a detailed worked example showing its application to Belonging Analytics, and (iii) implications for human-AI interaction in automated deductive analysis. |
Blaženka Divjak, Abhinava Barthakur, Vitomir Kovanovic and Barbi Svetec | The Impact of Learning Design on the Mastery of Learning Outcomes in Higher Education | Ensuring constructive alignment between learning outcomes (LOs) and assessment design is crucial to effective learning design (LD). While previous research has explored the alignment of LOs with assessments, there is a lack of empirical studies on how assessment design influences LO mastery, particularly the relationship between formative and summative assessments. To address this gap, we conducted an empirical study within an undergraduate mathematics course. First, we evaluated the course's learning design to identify potential gaps in constructive alignment. Then, using a sample of 169 students, we analysed their assessment results to explore how LO mastery is demonstrated through formative and summative assessments. This study provides a novel learning analytics (LA) methodology by combining cognitive diagnostic models, epistemic network analysis, and social network analysis to examine LO mastery and interdependencies. Our findings reveal a strong connection between the mastery of LOs through formative and summative assessments, underscoring the importance of well-constructed LD. The practical implications suggest that LA can serve as a critical tool for quality assurance by guiding the revision of LOs and optimising LD to foster deeper student engagement and mastery of critical concepts. These insights offer actionable pathways for more targeted, student-centered teaching practices. |
Chengyuan Yao, Carmen Cortez and Renzhe Yu | Towards Fair Transfer Learning of Educational Prediction Models: A Case Study of Retention Prediction in Community Colleges | Predictive analytics is a common learning analytics application in higher education, but resource-limited institutions may lack the capacity to develop their own models and have to deploy proprietary models trained in different contexts with little transparency, creating downstream harms. Transfer learning holds promise for expanding access to predictive analytics, but this potential remains under-explored. In this study, we evaluate the risks of deploying external retention prediction models in community colleges. Using administrative records from 4 research universities and 23 community colleges, we find evidence of performance and fairness degradation when external models are applied without localization. Notably, publicly available institution-level contextual information can be used to forecast performance drops, offering early guidance for model portability. Also, for vendors developing source models under privacy constraints, sequential training that selects training institutions based on demographic similarities shows the potential to enhance fairness without compromising predictive performance. For institutions without access to the training data, we find that established techniques like source-free domain adaptation are less successful in our contexts. Instead, customizing evaluation thresholds for different sensitive groups improves both performance and fairness. These insights suggest judicious use of contextual factors in model training, selection, and deployment to achieve equitable predictive model transfer. |
Juliette Woodrow and Chris Piech | Soft Grades: A Grade Response Theory which Enables Calibrated and Accurate Grades that Express Uncertainty | In traditional educational settings, students are often summarized by a single number--a final course grade--that reflects their performance. While final grades are convenient for reporting or comparison, they oversimplify a student's true ability and do not express uncertainty. In this paper, we introduce a new item-response model for classroom settings that infers a distribution over student abilities and uses this to represent each student's final grade as a probability distribution. This approach captures the uncertainty that comes from variations in both student performance and grading processes. Practical applications of our approach include enabling teachers to better understand grading confidence, impute missing assignment scores, and make informed decisions when curving final grades. For students, the model offers probabilistic estimates of their final course grades based on current performance, supporting informed academic decisions such as opting for Pass/Fail grading. We evaluate our model using real-world datasets, showing that the Soft Grades model is well-calibrated and surpasses the state-of-the-art polytomous IRT model in accurately predicting future scores. Additionally, we present a web application that implements our model, making it accessible for both teachers and students to leverage its benefits. |
Short Research Papers
*Please note these titles and abstracts may be subject to change as they are listed with pre-publication information. Any changes to titles and/or abstracts will be updated soon.
Authors | Title | Abstract |
Julie Le Tallec, Ethan Prihar and Tanja Käser | The Effect of Different Support Strategies on Student Affect | Within many online learning platforms, struggling students are provided with support to guide them through challenging material. Support comes in many forms, and is typically evaluated based on its ability to improve students' performance on future tasks. However, there is little experimentation to evaluate how these supports impact students' emotional states. Student's emotional state, or affect, significantly impacts their motivation to engage with learning material and persist through challenges. Positive emotions can foster intrinsic engagement and deeper commitment, whereas negative emotions may lead to disengagement and avoidance of challenging tasks. In this work, we use publicly available data from online experiments and affect modeling to causally evaluate the impact that different support strategies have on students' affect. Through analysis of 25 experiments with 6,463 total participants, we find multiple significant positive and negative changes in students' affect when receiving hints, examples, or scaffolding questions, despite all three having a positive impact on performance, revealing the need for more nuanced evaluations of support strategies to uncover their impact beyond just performance. |
Ryan Baker, Caitlin Mills and Jaeyoon Choi | The Difficulty of Achieving High Precision with Low Base Rates for High-Stakes Intervention | Automated detectors are routinely used in learning analytics for high-stakes, high-risk interventions. Such interventions depend on detectors with a low rate of false positives (i.e. predicting the construct is present when it is not present) in order to avoid giving an intervention where it is not needed, especially when such interventions can be costly or even harmful. This is turn suggests that such a detector needs to have high precision at the cut-off used by the detector for decision-making. However, high precision is difficult to achieve for the common case where the base rate of the target construct is low. In this paper, we demonstrate the difficulty of achieving high precision for low base rates, and demonstrate how other metrics (such as F1, Kappa, and AUC ROC) are insufficient for this specific use case and situation, despite their merits and advantages for other use cases and situations. |
Ryan Baker and Stephen Hutt | MORF: A Post-Mortem | There has been increasing interest in data enclaves in recent years, both in education and other fields. Data enclaves make it possible to conduct analysis on large-scale and higher-risk data sets, while protecting the privacy of the individuals whose data is included in the data sets, thus mitigating risks around data disclosure. In this article, we provide a post-mortem on the MORF (MOoc Replication Framework) 2.1 infrastructure, a data enclave expected to sunset and be replaced in the upcoming years, reviewing the core factors that reduced its usefulness for the community. We discuss challenges to researchers in terms of usability, including challenges involving learning to use core technologies, working with data that cannot be directly viewed, debugging, and working with restricted outputs. Our post-mortem discusses possibilities for ways that future infrastructures could get past these challenges. |
Ha Nguyen and Saerok Park | Providing Automated Feedback on Formative Science Assessments: Uses of Multimodal Large Language Models | Formative assessment in science education often involves multimodality and combines textual and visual representations. We evaluate the capacity of multimodal large language models (MLLMs), including Anthropic's Claude 3.5 Sonnet, Google's Gemini 1.5 Flash, and OpenAI's GPT-4o and GPT-4 Turbo, to generate automated evaluations of multimodal science assessments. Overall, the MLLMs can accurately transcribe students' hand-written text. The best performing models (Claude and GPT4-o) show moderate to substantial agreement with human evaluators in assessing students' scientific reasoning. MLLMs provided with example responses, scores, and explanations (few-shot learning) generally perform better than those without examples (zero-shot learning). Thematic analysis reveals cases where the models misevaluate the depth in students' answers, add details not included in the inputs (i.e., hallucinate), or show incorrect numerical reasoning. Findings demonstrate the feasibility of and considerations for using MLLMs to provide in-time feedback for science assessments. Such feedback can help to revise students' understanding and inform teachers' instructional practices. |
Conrad Borchers, Jeroen Ooge, Cindy Peng and Vincent Aleven | How Learner Control and Explainable Learning Analytics About Skill Mastery Shape Student Desires to Finish and Avoid Loss in Tutored Practice | Intelligent tutoring systems often apply AI to enhance student practice by supporting personalized learning, effort regulation, and goal setting. Much research has focused on integrating learner control and transparent decision-making into these systems, but learners are rarely engaged in the process of selecting learning materials. We explored how different levels of such control, combined with showing learning analytics on skill mastery, and visual what-if explanations, can support students with homework practice. Semi-structured interviews with six middle school students revealed three key insights: (1) participants highly valued learner control for an enhanced learning experience and better self-regulation, especially because most wanted to avoid losses in skill mastery; (2) only seeing their skill mastery estimates often made participants focus on their weaknesses; and (3) what-if explanations stimulated participants to focus more on their strengths and improve skills until they were mastered. These findings show how explainable learning analytics can shape students' selection strategies when they have control over what to practice. Furthermore, they suggest promising avenues for helping students learn to regulate their effort, motivation, and goals during homework. |
Alona Strugatski and Giora Alexandron | Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments | Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions (MCQs) in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating on text-rich student outputs. In this paper, we propose a method based on the application of Item Response Theory (IRT) to address this gap. Our approach operates on the assumption that artificial and human intelligence exhibit different response patterns, with AI cheating manifesting as deviations from the expected patterns of human responses. These deviations are modeled using Person-Fit Statistics (PFS). We demonstrate that this method effectively highlights the differences between human responses and those generated by premium versions of leading chatbots (ChatGPT, Claude, and Gemini). Furthermore, we show that the chatbots differ in their reasoning profiles. Our work provides both a theoretical foundation and empirical evidence for the application of IRT to identify AI cheating in MCQ-based assessments. |
Zuo Wang, Weiyue Lin and Xiao Hu | Self-service Teacher-facing Learning Analytics Dashboard with Large Language Models | With the rise of online learning platforms, the need for effective learning analytics (LA) has become critical for teachers. However, the development of traditional LA dashboards often requires technical expertise and a certain level of data literacy, preventing many teachers from integrating LA dashboards effectively and flexibly into their teaching practice. This paper explores the development of a self-service teacher-facing learning analytics dashboard powered by large language models (LLMs), for improving teaching practices. By leveraging LLMs, the self-service system aims to simplify the implementation of data queries and visualizations, allowing teachers to create personalized LA dashboards using natural languages. This study also investigates the capabilities of LLMs in generating charts for LA dashboards and evaluated the effectiveness of the self-service system through usability tests with 15 teachers. Preliminary findings suggest that LLMs demonstrate high capabilities in generating charts for LA dashboards, and the LLM-powered self-service system can effectively address participating teachers' pedagogical needs for LA. This research contributes to the ongoing research on the intersection of LLMs and education, emphasizing the potential of self-service systems to empower teachers in daily teaching practices. |
Thorben Jansen, Andrea Horbach and Jennifer Meyer | Feedback from Generative AI: Correlates of Student Engagement in Text Revision from 655 Classes from Primary and Secondary School | Writing is fundamental in knowledge-based societies, and engaging students in text revision through feedback is critical to developing writing skills. While automated feedback offers a promising solution to teachers' time constraints creating feedback, prior research indicates that 20 to 71 percent of students receiving feedback are non-adherent to the intervention and do not engage in any text revision. Despite these concerning figures, the non-adherence issue has not gained widespread attention, likely due to fragmented evidence from a few grade levels and writing tasks disconnected from regular teaching. Further, current literature cannot answer whether the issue persists when generative AI generates the feedback. We analyze data from an educational technology company, including 655 teacher-generated writing tasks involving 14,236 students across grades 1-12 and multiple subjects receiving feedback from generative AI. Our findings show that around half of the students who received feedback did not engage in text revision, regardless of grade level, task type, or feedback characteristics, and thus indicate a robust non-adherence issue. We discuss the importance of including the percentage of non-adherent students as a key metric in feedback research so that feedback effects are optimized to have large effects on average and that no student is left behind. |
Conrad Borchers and Ryan Baker | ABROCA Distributions For Algorithmic Bias Assessment: Considerations Around Interpretation | Algorithmic bias continues to be a key concern of learning analytics. We study the statistical properties of the Absolute Between-ROC Area (ABROCA) metric, a measure designed to detect differences in classifier performance between subgroups even when overall Area Under the ROC Curve (AUC) values are similar. We sample ABROCA under various conditions, including varying AUC differences and class distributions. We find that ABROCA distributions exhibit high skewness dependent on sample sizes, AUC difference, and class imbalance. When assessing whether a classifier is biased, this skewness inflates ABROCA values by chance, even when data is drawn (by simulation) from populations with equivalent ROC curves. These findings suggest that ABROCA requires careful interpretation given its distributional properties, especially when used to assess the degree of bias and when classes are imbalanced. |
Darren Butler, Conrad Borchers, Michael Asher, Yongmin Lee, Sonya Karnataki, Sameeksha Dangi, Samyukta Athreya, John Stamper, Amy Ogan and Paulo Carvalho | Does the Doer Effect Generalize To Non-WEIRD Populations? Toward Analytics in Radio and Phone-Based Learning | The Doer Effect refers to the positive relationship between active practice activities and learning in technology-supported learning environments. Most of the evidence, though broad, has been in the context of tutored practice in western, industrialized, rich, educated, and democratic (WEIRD) populations in North America and Europe. The present study asks: Does the Doer Effect generalize to non-WEIRD populations, where active practice in remote locales occurs through mobile learning applications and radio-based instruction? We offer evidence from N = 234 students in Uganda practicing through radio and phone-based instruction, including quizzing, as one common form of active practice. Our findings support the hypothesis that active learning improves learning outcomes. We generate analytics of learning that describe practice outcomes in these emerging learning settings, finding that learners with higher prior educational attainment show somewhat weaker Doer Effect correlations. This motivates further study of the Doer Effect in diverse populations, as considering more contextual factors could make active practice even more effective. Our findings contribute to emerging literature on out-of-classroom learning analytics settings, suggesting that insights from prior work may generalize to them. |
Mengtong Xiang, Jingjing Zhang, Mohammed Saqr, Han Jiang and Wei Liu | Capturing The Temporal Dynamics of Learner Interactions In Moocs: A Comprehensive Approach With Longitudinal And Inferential Network Analysis | While research on social network analysis is abundant and less frequently so temporal network analysis, research that uses inferential temporal network methods is barely existent. This paper aims to fill this gap by providing a comprehensive methodological approach using temporal networks and inferential longitudinal network analysis methods in the context of learner interactions in Massive Open Online Courses (MOOCs). We focus on three prominent methods: Temporal Network Analysis (TNA), Temporal Exponential Random Graph Models (TERGM) and Simulation Investigation for Empirical Network Analysis (SIENA). Using a five-week Nature Education MOOC as a case study, we compared the features, metrics of each method as well as their understanding of using network to analyze learner interactions. TNA focuses on describing and visualizing temporal changes in network structure, while TERGM and SIENA view networks as evolving systems influenced by individual behaviors and structural dependencies. TERGM treats network changes as a joint of random processes, while SIENA emphasizes the agency of learners and analyzes continuous network evolution. The findings provide guidelines for researchers and educators to select appropriate network analysis methods for temporal studies in educational contexts. |
Mamta Shah, Yuanru Tan, Brendan Eagan, Brittny Chabalowski and Yahan Chen | A Dual-Method Examination of Nursing Students' Teamwork in Simulation-Based Learning: Combining CORDTRA and Ordered Network Analysis to Reveal Patterns and Dynamics | This study examines nursing students' teamwork during a simulated pediatric scenario by combining Chronologically Ordered Representations of Discourse and Tool-Related Activity (CORDTRA) with Ordered Network Analysis (ONA). CORDTRA revealed each dyad's progression and critical moments during the scenario, while ONA illustrated how roles were divided. Our findings show that patient and parent interactions, education, and assessments were typically shared between students, whereas technical tasks such as dosage calculations were led by one student with support from the other. These findings highlight the nuanced ways in which manikin-based simulations foster essential teamwork skills, such as communication, task delegation, and problem-solving. This study highlights the theoretical benefit of integrating CORDTRA and ONA to capture both temporal and relational dynamics, along with the practical implication that targeted feedback and debriefing informed by these methods can enhance nursing students' individual and team performance, and by extension their practice readiness. |
Luiz Rodrigues, Cleon Pereira Junior, Newarney Costa, Hyan Batista, Luiz Felipe Bagnhuk Silva, Weslei Chaleghi de Melo, Dragan Gasevic and Rafael Ferreira Mello | LLMs Performance in Answering Educational Questions in Brazilian Portuguese: A Preliminary Analysis on LLMs Potential to Support Diverse Educational Needs | Question-answering systems facilitate adaptive learning and respond to student queries, making education more responsive. Despite that, challenges such as natural language understanding and context management complicate their widespread adoption, where Large Language Models (LLMs) offer a promising solution. However, existing research is predominantly focused on English, proprietary models, and often limited to a single question type, subject, or skill, leaving a gap in understanding LLMs' performance in languages like Brazilian Portuguese and across questions of various characteristics. This study investigates how LLMs could be integrated in an educational question-answering system efficiently to answer different question types (multiple-choice, cloze, open-ended), subjects (mathematics and Portuguese language), and skills (summation/subtraction, multiplication, interpretation, and grammar), evaluating answers by GPT-4 - the main LLM at the time of writing - and Sabiá - the open-source Brazilian Portuguese LLM - based on grades assigned by two experienced teachers. Overall, both LLMs demonstrated strong overall performance, with mean scores close to 9.8 out of 10. However, specific challenges emerged, with distinct strengths and weaknesses observed for each model, such as GPT-4's error in a multiple-choice subtraction question and Sabiá's misinterpretation of a cloze question. |
Zaibei Li, Shunpei Yamaguchi and Daniel Spikol | OpenMMLA: an IoT-based Multimodal Data Collection Toolkit for Learning Analytics | Multimodal Learning Analytics (MMLA) expands traditional learning analytics into the digital and physical learning environment, using diverse sensors and systems to collect information about education in more real-world environments. Challenges remain in making these technologies practical for capturing data in authentic learning situations. With the advent of readily accessible powerful artificial intelligence that includes multimodal large language models, new opportunities are available. However, few approaches allow access to these technologies, and most systems are developed for specific environments. Recent work has begun to make toolkits with access to collecting data from sensors, processing, and analyzing, yet these tools are challenging to integrate into a system. This paper introduces OpenMMLA, a toolkit approach that provides interfaces for harnessing these technologies into a MMLA platform. The toolkit interface allows for audio analysis of conversations and positional tracking of people using badges with fiducial markers and Bluetooth. The toolkit also allows for the local use of LLMs for speech-to-text descriptive analysis of video and provides a dashboard for real-time and post-analysis of data collected. The paper also provides an initial system evaluation. |
Kevin Huang, Rafael Ferreira Mello, Cleon Pereira Junior, Luiz Rodrigues, Martine Baars and Olga Viberg | That's What RoBERTa Said: Explainable Classification of Peer Feedback | Peer feedback (PF) is essential for improving student learning outcomes, particularly in Computer-Supported Collaborative Learning (CSCL) settings. When using digital tools for PF practices, student data (e.g., PF text entries) is generated automatically. Analyzing these large datasets can enhance our understanding of how students learn and help improve their learning. However, manually processing these large datasets is time-intensive, highlighting the need for automation. This study investigates the use of six Machine Learning models to classify PF messages from 231 students in a large university course. The models include Multi-Layer Perceptron (MLP), Decision Tree, BERT, RoBERTa, DistilBERT, and C\textit{hatGPT4o}. The models were evaluated based on Cohen's accuracy, and F1-score. Preprocessing involved removing stop words, and the impact of this on model performance was assessed. Results showed that only the Decision Tree model improved with stop-word removal, while performance decreased in the other models. RoBERTa consistently outperformed the others across all metrics. Explainable AI was used to understand RoBERTa's decisions by identifying the most predictive words. This study contributes to the automatic classification of peer feedback, which is crucial for scaling learning analytics and providing better in-time support for students in CSCL settings. |
Bailing Lyu, Chenglu Li, Hai Li, Hyunju Oh, Yukyeong Song, Wangda Zhu and Wanli Xing | Exploring the Role of Teachable AI Agents' Personality Traits in Shaping Student Interaction and Learning in Mathematics Education | With advancements in artificial intelligence (AI), educational researchers have integrated AI into mathematics education to offer scalable instructional practices and personalized learning. One such innovation is teachable AI agents, designed as learners to facilitate learning by teaching. While prior research supports the benefits of learning by teaching, its effectiveness depends on the quality of tutor-tutee interaction. However, few studies have explored how features of teachable agents, particularly personality traits, influence student interactions and the agents' effectiveness. Given the documented importance of personality traits in student learning, this empirical study examines the relationship between teachable AI agents' personality traits and students' math learning experiences in a naturalistic setting. Results indicate that students provided more cognitive support when interacting with agents characterized by neuroticism, openness, and conscientiousness, while more affect management and non-responsive behaviors were observed with agents displaying extraversion. These interaction patterns impacted the effectiveness of the teachable agents, providing implications for the integration of AI systems in education. |
Alejandro Ortega-Arranz, Paraskevi Topali and Inge Molenaar | Configuring and Monitoring Students' Interactions with Generative AI Tools: Supporting Teacher Autonomy | The widespread use of Generative Artificial Intelligence (GenAI) tools, such as ChatGPT, has come along with multiple benefits in education (e.g., 24h teacher). However, these tools also hinder teachers' autonomy, limiting the capacity and freedom to exert control over students' actions and their learning process. Additionally, the generic character of the GenAI output usually lacks contextualization (e.g., course curriculum), thus hampering the successful attainment of the course goals. To address these issues, this paper proposes the development of a system mediating between the GenAI interfaces and their back-ends. This system allows teachers to monitor the students' interactions and align the given answers with the course learning objectives and teaching methods. This research follows the Systems Development Research methodology, and within the first iteration, we developed a prototype that was evaluated with 8 secondary-school teachers. Results showed a high perceived usefulness of the system for monitoring students' interactions; for alerting the teachers to take specific actions (e.g., suspicious copy-paste behaviours), and for having control over the GenAI outputs. Additionally, while most teachers perceived a higher autonomy level within the given scenarios, some teachers did not. The evaluation also served to collect further requirements and usability features to keep improving the tool. |
Hongchen Pan, Eduardo Araujo Oliveira and Rafael Ferreira Mello | Exploring Human-AI Collaboration in Educational Contexts: Insights from Writing Analytics and Authorship Attribution | This research investigates the characteristics of student essays written with and without generative AI assistance, using stylometric analysis and deep learning techniques to explore human-AI collaboration in academic writing. To address three research questions, the study examines: (1) patterns in vocabulary diversity, sentence structure, and readability in AI-generated versus student-written essays; (2) the development of a stylometry-based BERT model for authorship attribution, focusing on linguistic features to accurately distinguish between student and AI-generated content; and (3) the application of this model to measure AI involvement at the sentence level in collaborative essays. Using a dataset of student and AI-assisted essays, we observed distinct stylistic differences, with AI-generated content exhibiting higher lexical diversity and readability scores. The BERT model demonstrated high accuracy (85\%), precision (79\%), and F1-scores (74\%) in identifying AI contributions, surpassing the adopted baseline. While limitations such as dataset imbalance and variability in AI outputs remain, this study highlights the potential of stylometric analysis in improving authorship attribution and quantifying AI involvement in academic writing. These findings provide educators with tools to monitor student progress, offer personalised feedback, and maintain academic integrity in the face of growing AI usage in education. |
Hai Li, Wanli Xing, Chenglu Li, Wangda Zhu, Bailing Lyu, Fan Zhang and Zifeng Liu | Who Should Be My Tutor? Analyzing the Interactive Effects of Automated Text Personality Styles Between Middle School Students and a Mathematics Chatbot | Engaging with instructors through question-and-response techniques is an efficient method for delivering mathematics instruction to middle school learners. The flexible nature and sophisticated functionality of large language models (LLMs) have fueled interest in automating this process to strengthen students' mathematical understanding, with the chatbot's personality serving as an essential aspect of its design. While much research has explored students' preferences for chatbot personalities, preferences in the context of learning gains, considering students' own personalities, remain unclear. This study draws on QA dialogue logs between middle school students and a chatbot from a U.S.-based online mathematics learning platform. An automated feature extraction framework was designed to analyze text style from a personality perspective, extracting features including emotional polarity (reflecting emotional arousal), subjectivity (degree of subjective-neutral expression), and the big five personality traits (indicating potential personality tendencies). Linear regression was then used to analyze the relationship between these features and students' learning gains in mathematics. Our findings support the complementary hypothesis from interpersonal interaction theory, which posits that students prefer chatbot personalities that complement their own. We discuss the implications for instructional design. Our analysis contributes to the development of more effective conversational AI applications in educational technology. |
Wangda Zhu, Wanli Xing, Bailing Lyu, Fan Zhang, Chenglu Li and Hai Li | Bridging the Gender Gap: The Role of AI-Powered Math Story Creation in Learning Outcomes | Addressing the gender gap in K-12 math education is essential for providing equitable learning opportunities, as historical disparities in engagement, performance, and confidence between male and female students in mathematics are often linked to educational biases. Integrating Generative AI (GAI) into math education shows promise for bridging the gender gap in K12 math learning. This study proposes an innovative pedagogy and platform that enables students to create math stories powered by GAI, enhancing their conceptual understanding of key mathematical ideas. The platform was implemented in two K5 schools to evaluate its effectiveness and mechanism (N = 86). Pre- and post-intervention surveys and usage logs indicated significant improvements in students' learning outcomes regarding Math Question (MQ) skills and Math Story (MS) skills. Bayes SEM further modeled the mechanism: students' creating math stories powered by GAI significantly improves MS, which further improves MQ. We further found female students were significantly more engaged in creating stories on this platform and gained more improvement on MQ than male students. The results suggest that AI-powered math story creation can be an effective tool for deepening students' mathematical learning outcomes and has the potential to mitigate the gender gap. |
Fanjie Li and Alyssa Wise | From Filling Gaps to Amplifying Strengths: Exploring an Asset-Based Approach to Learning Analytics | This paper explores how an asset-based lens can help expand the design space of learning analytics beyond a deficit-oriented approach focused mainly on identifying and remedying gaps to one that elevates every student's strengths and potentials. To do so, we draw on the rich history of asset-based pedagogies to consider expansive possibilities for the kinds of information that analytics uncover, the processes that analytics support, and the outcomes that analytics seek to engender. To explore the value and feasibility of an asset-based lens for learning analytics, this paper instantiates the approach with a proof-of-concept prototype designed to support teachers' noticing of student contributions to discussions in a K-12 science learning classroom. The illustrative case demonstrates ways in which the analytic capabilities of large language models can be leveraged to make visible, and create opportunities for teachers to build upon, the funds of knowledge that students bring to the classroom, thereby creating spaces for minoritized students' agentive engagement. |
Arash Ahrar, Mohammadreza Doroodian and Marek Hatala | Exploring Eye-tracking Features to Understand Students' Sensemaking of Learning Analytics Dashboards | Learning analytics dashboards (LADs) are widely used in learning analytics as visual tools to present information about learning activities and outcomes. However, only few studies have explored how students make sense from LAD elements and what cognitive processes follow after viewing each element. In this study, we explore how eye-tracking data can help researchers to identify salient LAD elements critical to students' sensemaking process. Our findings reveal that the eye-tracking derived features, including fixation duration and eye movement patterns, are highly indicative of students' social comparison tendencies and offer valuable insights into their sensemaking processes. |
Seehee Park, Nia Dowell, Sidney D'Mello and Danielle Shariff | Understanding Collaborative Learning Processes and Outcomes Through Student Discourse Dynamics | This study explores the relation between students' discourse dynamics and performance during collaborative problem-solving activities utilizing Linguistic Inquiry Word Count (LIWC). We analyzed linguistic variables from students' communications to explore social and cognitive behavior. Participants include 279 undergraduate students from two U.S. universities engaged in a controlled lab setting using the physics related educational game named Physics Playground. Findings highlight the relationship between social and cognitive linguistic variables and student's physics performance outcome in a virtual collaborative learning context. This study contributes to a deeper understanding of how these discourse dynamics are related to learning outcomes in collaborative learning. It provides insights for optimizing educational strategies in collaborative remote learning environments. We further discuss the potential for conducting computational linguistic modeling on learner discourse and the role of natural language processing in deriving insights on learning behavior to support collaborative learning. |
Joy Yun, Allen Nie, Emma Brunskill and Dorottya Demszky | Exploring the Benefit of Customizing Feedback Interventions For Educators and Students With Offline Contextual Multi-Armed Bandits | Automated feedback to teachers powered by natural language processing has been successful at improving instruction and student outcomes across various learning contexts. However, existing one-size-fits-all feedback interventions are not equally effective for all educators and students. Understanding whether and how customization might enhance the effectiveness of automated feedback could improve the impact of such interventions. We focus this investigation using data from a randomized controlled trial conducted on a peer SAT math tutoring program to investigate the utility of providing tutors and/or learners with feedback on their discourse during tutoring sessions. We employ a partially data-driven, partially expert knowledge driven, process to propose some potential context-specific intervention policies. We then use offline contextual multi-armed bandit policy evaluation measures to estimate the potential performance of these interventions compared to providing a single intervention designed to maximize overall average performance. Our preliminary results suggest that there may be value in providing differentiated interventions, and point to the potential for such analysis to be used as a hypothesis-generating tool for future empirical studies. |
Luis P. Prieto, Riordan Alfredo, Henry BenjamÃn DÃaz-ChavarrÃa, Roberto Martinez-Maldonado and Vanessa EcheverrÃa | VALA/AID: A Method for Rapid, Participatory Value-sensitive Learning Analytics and Artificial Intelligence Design | The adoption of learning analytics (LA) and artificial intelligence (AI) in education has long been a challenge, in part due to the ethical issues it engenders (e.g., the alignment of technology-embedded and human stakeholder values). Value-sensitive design (VSD) is a human-centered and theory-grounded approach to technology design that explicitly elicits and accounts for human values. However, there is scant concrete guidance on how to involve students in co-designing LA technologies from a VSD perspective, in an efficient manner. This paper presents a novel method, called VALA/AID, to elicit student values, challenges and motivations, in the early stages of an LA design process. We briefly illustrate the application of the method and the kind of evidence and design insights that can be distilled from it, for a relatively underxplored context in LA research: doctoral education. |
Markus Wolfgang Hermann Spitzer, Lisa Bardach, Younes Strittmatter and Korbinian Moeller | Usage and performance declines in a classroom-integrated digital learning software over the course of an academic year | In increasing numbers of classrooms worldwide, students use digital learning software. However, we know little about the trajectories of usage and the performance within such digital learning software over the academic year. This study analyzed real-world longitudinal data from a mathematics learning software used in classrooms in Germany and the Netherlands (\~16,000 students who worked on >23 million problems). We evaluated students' usage and performance trajectories across an academic year by examining the percentage of students using the software, worked-through problems, active days and weeks, as well as performance. Our results indicate a decline in both usage and performance over the course of the academic year, with overall lower usage in Germany than in the Netherlands. Our findings highlight the need for further research into the factors maintaining or increasing the usage of and performance in classroom-integrated digital learning software over extended periods. |
Yiming Liu, Zhengyang Ma, Jeremy Tzi-Dong Ng and Xiao Hu | Multimodal learning analytics for game-based assessment of collaborative problem solving skills among young students | Collaborative Problem Solving (CPS) has emerged as a key competence for the 21st century. In support of this, valid assessments of CPS skills have become critical. However, limited research has designed and developed CPS assessments for young students. Based on multimodal learning analytics, we aim to develop and validate a game-based assessment of CPS for primary school students. In this study, evidence centered design approach was used to design and develop the game-based CPS assessment. Specifically, based on the ATC21S CPS framework, we designed and developed a mobile multiplayer online 3D role-playing game on CPS and a coding scheme for coding students' gameplay data (i.e., game logs and voice chat). A total of 32 primary 5 students participated in this study to play the game in a group of four and complete a questionnaire of CPS skills. The gameplay data were coded based on our coding scheme. Correlation analysis between the coded results and the CPS questionnaire data supported the criterion validity of our game-based assessment measure. Additionally, the results of expert interview facilitated our understanding of assessment design and data use. This study will make methodological and practical contributions to the integration of MMLA into game-based CPS assessments. |
Ben Hicks and Kirsty Kitto | Game Theoretic Models of Intangible Learning Data | Learning Analytics is full of situations where features essential to understanding the learning process cannot be measured. The cognitive processes of students, their decisions to cooperate or cheat on an assessment, their interactions with class environments can all be critical contextual features of an educational system that are impossible to measure. This leaves an empty space where essential data is missing from our analysis. This paper proposes the use of Game Theoretic models as a way to explore that empty space and potentially even to generate synthetic data for our models. Cooperating or free-riding on the provisioning of feedback in a class activity is used as a case study. We show how our initially simple model can gradually be built up to help understand potential educator responses as new situations arise, using the emergence of GenAI in the classroom as a case in point. |
Jinwon Kim, Qiujie Li, Zilu Jiang and Di Xu | Not ALL Delay is Procrastination: Analyzing Subpatterns of Academic Delayers in Online Learning | In prior literature on using clickstream data to capture student behavior in virtual learning environments, procrastination is typically measured by the extent to which students delay their coursework. However, students may delay coursework under personal and environmental contexts and not all delays should be considered procrastination. Thus, this study aims to identify different types of delayers and examine how they differ in academic engagement and performance. We utilized learning management system data from three online undergraduate courses. Specifically, using data from the first three weeks of the course, we classified delayers into three subgroups – high-achieving, low-achieving, and sporadic delayers – based on the timing of their coursework access and submission, the consistency of these behaviors, and their short-term course performance. Our findings reveal that the subgroups significantly differ in course engagement and long-term performance. Low-achieving delayers exhibited the lowest levels of engagement and performance. While sporadic delayers and high-achieving delayers demonstrated comparable levels of engagement, the latter received higher course grades. These findings challenge commonly used LMS measures for procrastination, highlight the complexity of academic delays, and reveal nuanced patterns of student behavior. The results contribute to discussions on future interventions and research related to distinct forms of delays. |
Steven Yeung | Examining the performance of automated writing evaluation (AWE) approaches: A comparative analysis of rule-based, machine learning and large language model approaches | Automated Writing Evaluation (AWE) tools have been proved to be beneficial to writing development. Research on AWE methods is essential for improving tool performance. As new methods emerge, further comparative studies are needed. This study examines the performance of various AWE approaches, comparing rule-based and statistical approaches, machine learning (ML) models, and a large language model (LLM) in assessing academic essays from the TOEFL11 dataset. The three AWE methods were applied to assess the TOEFL11 dataset in order to compare their performance. The results show that GPT-4 outperformed the other two approaches in terms of QWK, Pearson's correlation coefficient and mean absolute error, while Support Vector Machine (SVM) model in machine learning had the highest accuracy. The paper provides a detailed comparison of these three approaches and discuss implications for future research in the area of AWE. |
Hyunju Oh, Zifeng Liu and Wanli Xing | Do Actions Speak Louder Than Words? Unveiling Linguistic Patterns in Online Learning Communities Using Cross Recurrence Quantification Analysis | This study explores the dynamics of engagement in online learning communities (OLCs), focusing on online math discussion forums. It employs Social Network Analysis (SNA) and Cross-Recurrence Quantification Analysis (CRQA) to examine interaction patterns and linguistic synchrony across participant clusters with varying levels of engagement. SNA reveals three distinct participant groups--core, intermediate, and peripheral--each exhibiting different interaction levels. The study's findings highlight the significance of coordinated discourse in fostering collaborative learning and engagement in OLCs. This research contributes to the theoretical framework of Social Capital Theory by emphasizing the role of shared language in promoting cohesive communication. The results offer valuable insights for designing more effective online learning environments that encourage sustained student participation and knowledge construction. |
Practitioner Reports
*Please note these titles and abstracts may be subject to change as they are listed with pre-publication information. Any changes to titles and/or abstracts will be updated soon.
Authors | Title | Abstract |
Marije Goudriaan, Anouschka van Leeuwen and Ünal Aksu | Lessons Learnt on How to Combine Top-Down Facilities for Learning Analytics with Bottom-Up Initiatives | In this Practitioner Paper, we describe a University's general approach to initiating LA projects, namely a combination of a top-down and bottom-up approach, and evaluate this approach based on the results from two LA projects. Project 1 is about providing study delay predictions via a study advisor dashboard, and Project 2 focuses on enabling students to self-monitor academic writing skills through a student dashboard. Smooth coordination between pedagogy, privacy, and technology resulted in the successful realization of these projects. However, the adoption of the developed dashboards by end-users was limited. We discuss potential causes, solutions, and general recommendations for institutions that are working on the adoption of LA. |
Shaun Kellogg and Thomas Tomberlin | Leveraging Learning Analytics to Support North Carolina's Advanced Teaching Roles Program | The North Carolina Teacher Compensation and Advanced Teaching Roles (ATR) program allows public school units to develop innovative teacher compensation models designed to improve student and teacher outcomes. The program enables highly effective teachers, known as Advanced Teachers, to either take responsibility for more students or lead small teams of teachers by providing professional development, coaching, and instructional support. The North Carolina Department of Public Instruction selected the Friday Institute for Educational Innovation at North Carolina State University as their research partner. This partnership has two primary goals: 1) to assess the academic and professional impact of ATR programs, and 2) to understand and improve their implementation. Using a collaborative data-intensive improvement research framework, the research employs a variety of methods, including both conventional qualitative and statistical methods, as well as more novel approaches drawn from the field of learning analytics such as data dashboards, epistemic network analysis, and machine learning. |
Matthias Ehlenz and René Röpke | A Human-centered Approach on Collecting Learning Analytics Insights in GitLab | This practitioner's report presents a human-centered approach to the collection of learner data from version control systems in software projects. With platforms like GitLab being widely used in computer science courses in higher education, instructors have access to the platform's usage data and can incorporate it in their grading. So far, metrics like lines of code, number of commits and commit message quality are used to draw conclusions on learners' performance, especially in comparative analyses for group projects. Meanwhile, GitLab offers way more functionalities and information for data analysis, e.g., for project management. This work aims at making this information more accessible and usable in the context of learning analytics. It focuses on the stage of data collection and the implementation of the necessary software components, enabling data analysis in the future. |
Carleigh Young | Adopting Learning Analytics in a Business Intelligence Framework | This presentation explores how a small rural institution adopted Learning Analytics within a Business Intelligence framework to address enrollment challenges amidst restricted budgets, outdated systems, and limited resources. By focusing on cultural transformation, data preparation, and systems integration, this approach highlights strategies, challenges, and recommendations for establishing sustainable analytics practices in resource-limited contexts. |
Tom Olney | Evaluating learning analytics implementations: three approaches for practice | Approaches for evaluating learning analytics implementations are as varied as the definitions of success they attempt to measure. Practically, some have the potential to provide bounded insight and can be simple to apply, whilst others offer deeper, holistic understanding but can be complicated to manage. In order to extend practice, and develop more effective, responsible, and successful implementations it is suggested that it would be advantageous for practitioners to have broad knowledge of several methodologies that could be used for evaluating learning analytics implementations. |
Camille Kandiko Howson and Charlotte Whitaker | Learning Analytics as a Measure of Educational Gain | A national review of measurement of students' learning gain in England identified student engagement as the greatest challenge. Students did not see the value, have the time or interest, or were sufficiently made aware of opportunities to complete additional tests and surveys. To be able to explore the educational gain of students, without additional burden upon them, at Imperial College London we have explored over the past four years how to get the most out of the data we already have about students--primarily through their data trails across the institution and engagement with virtual learning platforms. This presentation explores the potential role of AI in supporting this and the engagement with students and academic staff to capture data relevant to them. |
Angelica Risquez, Teresa Curtin, Chris McInerney, Michael P. O'Brien, Donal Palcic, John Walsh, Mohd Fazil and Sarah Gibbons | STELA Live. Implementing Learning Analytics for Student Success | This presentation describes a large cross-institutional project run at an Irish university which used learning analytics (LA) capabilities to enhance student engagement. The project was a first for the institution in several ways, as it took a centralized and coordinated approach to the utilization of LA for the first time. The purpose of this project was to establish the infrastructure and framework necessary to provide learning interventions that could mitigate the risk of students underperforming in selected first-year modules. The predictive models utilized a combination of demographic data, continuous assessment scores, and student engagement data derived from the university's Virtual Learning Environment (VLE). After building these models with a cohort of 8000 students over four academic years, a pilot intervention was designed in which about 2000 students were notified about their likelihood of success based on the model's predictions and were directed toward available academic support services. The outcomes of this pilot intervention were evaluated and the findings are shared to offer insights on how learning analytics can be applied in higher education to support student success, particularly in large, diverse cohorts. |
Yu-Hxiang Chen, Ju-Shen Huang, Jia-Yu Hung and Chia-Kai Chang | Leveraging Knowledge Graphs and Large Language Models to Track and Analyze Learning Trajectories | This study addresses the challenges of tracking and analyzing students' learning trajectories, particularly the issue of inadequate knowledge coverage in course assessments. Traditional assessment tools often fail to fully cover course content, leading to imprecise evaluations of student mastery. To tackle this problem, the study proposes a knowledge graph construction method based on large language models (LLMs), which transforms learning materials into structured data and generates personalized learning trajectory graphs by analyzing students' test data. Experimental results demonstrate that the model effectively alerts teachers to potential biases in their exam questions and tracks individual student progress. This system not only enhances the accuracy of learning assessments but also helps teachers provide timely guidance to students who are falling behind, thereby improving overall teaching strategies. |
Sandra Sawaya and Sidney D'Mello | How a Professional Learning Analytics Tool Enhances Coaching Practices in a High-Dosage Tutoring Program | This study investigates how coaches use a professional learning analytics tool (CAP) to provide feedback to tutors in the real-world context of a high-dosage tutoring program: ABC Tutoring. CAP provides analytics based on tutor discourse practices known as talk moves (e.g., pressing for reasoning, relating to one another). We conducted a think-aloud study with six coaches to understand how they make meaning from CAP analytics and use this tool in their coaching workflows. Coaches vary in how they use CAP to provide feedback for their tutors and findings suggest recommendations to improve the tool and the coach user experiences. This study also highlights the important role CAP plays in enhancing coaching practices. |
Wei Qiu, Maung Thway, Joel Weijia Lai and Fun Siong Lim | GenAI for teaching and learning: a Human-in-the-loop Approach | This paper presents a human-in-the-loop development and implementation of a Socratic generative artificial intelligence (GenAI) tutor for undergraduate statistics courses. GenAI has potential to personalize and encourage desired deep learning behaviors in a diverse student population. However, thorough planning and evaluations are essential to ensure responsible use of AI. Our systematic approach started with a GenAI tutor designed with course coordinators and instructors, followed by a trial phase involving student volunteers and instructors. The GenAI tutor was piloted in a real class setting, with data collected on the conversation logs, the experiences of both students and instructors, as well as the resulting outcomes. This approach fosters trust in GenAI and facilitates continuous improvement. The findings contribute to the ongoing discourse surrounding the use of AI in learning environments, with a particular focus on enhancing human capabilities. |
Inma Borrella and Eva Ponce-Cueto | Developing Metrics to Evaluate Student Engagement in MOOCs: A Learning Analytics Approach | This report presents the development and implementation of learning analytics metrics to evaluate student engagement in a MOOC-based program. The initiative aims to address the limitations of traditional evaluation methods by introducing a three-tiered system of metrics: real-time course monitoring, post-hoc analysis for course evaluation, and program-wide trend identification. These metrics offer practical insights into student behavior, support timely interventions, and guide course design improvements. Our findings highlight the critical impact of data-driven decision-making in online education, with implications for improving student outcomes and program management in MOOCs. |
Simon Rimmington and Rachel Maxwell | How engagement analytics is helping retain Foundation Year students at [University Name] | The [Uni] Foundation Year (UFY) is a route to undergraduate study for students who tend to be from underrepresented backgrounds and have had mixed experiences in their journeys through the education system. The UFY recognises the strategic value of learner analytics in addressing student engagement and retention. Analysing student data facilitates implementation of targeted interventions that boost engagement over the entire student journey. This presentation considers how data-driven insights enhance student success and learning experiences, examining the implementation of learner analytics, considerations surrounding student data, and their impact. These initiatives lead to evidence-based strategies that enhance inclusivity and student support, aligned to wider university priorities. |
Gavin Porter | Social annotation: metric and intention matching | Instructors desire quality dialogue and balanced interactions from students in discussion settings. Too many students in a forum may lead to information overload, while too few students may not be productive. A healthy blend of initiative and responsive output from each student in discussions is also a reasonable pedagogical goal. Built-in social annotation platform data organized on a per-document basis is lacking in student-centered longitudinal metrics, which have been suggested in many prior dashboard studies. In this practitioner report, a longitudinal analysis of student social annotation output across numerous documents allowed flagging of those clearly outside an initiative/responsive norm. This suggests an early to mid-course correction and valuable dashboard metric, as these same students would not have been identified by other more obvious metrics. Cycling students through various group sizes also influenced the initiative/responsive balance, with desirable outcomes in two out of three sizes tested. Consideration of how metrics fit pedagogical goals will be essential in building future social annotation analytic dashboards. |
James Rauh, Edward Presutti, Hongli Xuan and Julie Fineman | Early detection of at-risk students in a DPT program utilizing predictive analytics and data visualization | This position paper describes the ongoing design and application of sequential data events to determine at risk students in a doctoral of physical therapy (DPT) program. Data elements were aggregated on each student from external and internal data sources that span from application through licensure examination. The goal of this initial project was to aggregate data into cohesive data storage and provide consolidated data visualization to facilitate data-informed decision-making. This will allow faculty to monitor student success, identify when and where students are struggling, develop timely remediation practices and improve future teaching/learning practices. Preliminary analysis links performance below a B in clinical decision-making didactic courses to first-time failure of licensing board exams, National Physical Therapy Exam (NPTE). Further, data analytics are focused on identifying individual assessments early in the curriculum that correlate to poor NPTE performance, in order to flag at risk students and allow for real time educational interventions during specific courses. |
Nasrin Dehbozorgi and Mourya Teja Kunuku | Deep-Reflect: an LLM-based Reflection Analytics Tool | This practitioner report introduces an AI-based framework for analyzing students' reflections. Integrating Large Language Models (LLMs) into educational tools has revolutionized learning analytics by allowing complex analysis of textual data. Reflective writing is known to promote cognitive and metacognitive skills among students. However, providing timely feedback on these reflections is a time-intensive task for educators, often limiting its practice. This paper introduces Deep-Reflect, an LLM-powered tool designed to automate the analysis of student reflections by extracting learning outcomes and challenges and visualizing them through a dynamic dashboard. This tool enables instructors to provide timely feedback and make data-driven interventions. A case study conducted in a graduate software engineering course showed that using Deep-Reflect significantly improved student performance. This finding highlights the potential for LLM-powered tools to enhance reflective learning and student outcomes in higher education settings. |
Wei Qiu, Andy Wai Hoong Khong and Fun Siong Lim | From Research to Practice: Translating a Dissertation-based solution into an Enterprise-level System | Reducing student dropout rates and enhancing academic success are critical challenges in higher education. While predictive machine learning models have shown promise in identifying at-risk students, their practical deployment often remains elusive. This paper presents a scalable data pipeline to operationalize a suite of grade-prediction models developed during a Ph.D. program. By integrating Denodo, Dataiku, Snowflake, and Qliksense, we established a robust and secure data flow, encompassing data collection, transformation, modeling, validation, and visualization. The pipeline automatically updates predictions every six months, enabling timely intervention by student care managers. This successful case study demonstrates the potential of AI and data science to improve student retention and foster academic success. |
Aditya Phatak, Suvir Bajaj, Rutvik Gandhasri, Pavan Kadandale, Jennifer Wong-Ma and Hadar Ziv | Evaluation and Learning Enhancement Via Automated Topic Extraction (ELEVATE) | Every term, instructors receive course evaluations that, in theory, should provide them with insights into student experiences in their course. However, manually identifying recurring themes and extracting actionable insights from potentially thousands of reactions is extremely time-consuming, if not impossible. We present Evaluation and Learning Enhancement Via Automated Topic Extraction (ELEVATE), a topic modeling tool designed to cluster student responses and extract latent themes, topics, and trends from large student evaluation datasets. ELEVATE offers an intuitive dashboard for users that effectively integrates qualitative (identification of topics) and quantitative (sentiment type and strength) analyses. Furthermore, this paper presents one case study to showcase its capabilities in learning analytics: an investigation on variations between offerings of a Computer Science course taught by multiple professors. |
Nicola Wilkin, Celia Greenway and Kelly Hamilton | The Birth of an institutional Learner Analytics Platform | If you do not have a learner analytics platform as an institution. Should you? And how might you design and implement it? This case study reviews a journey to launch of a learner analytics platform, at an institution with c40000 students. It considers the diverse needs and ambitions of all stakeholders. Central to all design decisions has been "how will this benefit the students?" We review our stepwise implementation, where each step considers the scale of change in terms of the combined parameters of awareness of the platform, digital literacy of stakeholders and signposting supportive actions. |