Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents

1 College of Computer Science and Technology, Zhejiang University
2 College of Education, Zhejiang University
3 Hong Kong University of Science and Technology

*Corresponding Author

Abstract

Large language models (LLMs) are revolutionizing education, with LLM-based agents playing a key role in simulating student behavior. A major challenge in student simulation is modeling the diverse learning patterns of students at various cognitive levels. However, current LLMs, typically trained as "helpful assistants", target at generating perfect responses. As a result, they struggle to simulate students with diverse cognitive abilities, as they often produce overly advanced answers, missing the natural imperfections that characterize student learning and resulting in unrealistic simulations. To address this issue, we propose a training-free framework for student simulation. We begin by constructing a cognitive prototype for each student using a knowledge graph, which captures their understanding of concepts from past learning records. This prototype is then mapped to new tasks to predict student performance. Next, we simulate student solutions based on these predictions and iteratively refine them using a beam search method to better replicate realistic mistakes. To validate our approach, we construct the Student_100 dataset, consisting of 100 students working on Python programming and 5,000 learning records. Experimental results show that our method consistently outperforms baseline models, achieving 100% improvement in simulation accuracy and realism.

Introduction Figure

Existing LLM-based simulations struggle to accurately replicate behaviors at varying cognitive levels and produce overly advanced responses that undermine the validity of the simulation.

Dataset Curation

Dataset Figure

To support our simulation task, we construct Student_100, a dataset of sequential programming records collected from PTA. Assuming short-term cognitive stability, we include only tasks completed within a week. Each record contains the task, student-written code, correctness, and expert annotations on student behavior. For each of 100 students, we select 40 past learning records and 10 simulation records. Compared to existing datasets, Student_100 offers richer behavioral and sequential information, providing a solid foundation for modeling student reasoning.

Method

Method Figure

In the first stage, we construct a student cognitive prototype by iteratively building a knowledge graph from past learning records. This graph contains concepts, their relationships, and the local cognitive state libraries. After processing these records, we assess the student's mastery of each concept to create a global cognitive prototype. In the second stage, we use this prototype to predict behavior for new tasks. Unlike traditional methods, which rely on superficial similarities and risk incorrect retrieval, our approach maps the cognitive prototype to the task, identifying relevant concepts for accurate predictions. In the third stage, we employ a beam search-based self-refinement process to ensure the generated solution aligns with predicted behavior, improving simulation authenticity.

Qualitative Result

Case Figure

Examples of simulated results. Similarity-based retrieval methods rely on superficial task similarities, leading to inaccurate predictions. In contrast, our method maps the cognitive prototype to relevant concepts, ensuring accuracy. The self-refinement process then iteratively adjusts the solution for precise simulation.