Educational Data Mining Improves Students’ Lives

by Evelyn Levine

Dr. Ryan Baker is an Associate Professor at the University of Pennsylvania and the Director of the Penn Center for Learning Analytics. His research on Learning Analytics spans the fields of Educational Data Mining and Human-Computer Interactions. Dr. Baker will be giving a keynote speech at the ICELW 2020, the 13th annual International Conference on E-Learning in the Workplace, to be held online from June 10-12.  More information about the conference, along with a registration page, is at

I spoke with Ryan about his work; a slightly-edited transcript of the interview is below.

Q: You study student engagement/disengagement while utilizing educational software. But, how, exactly do you do that?

A: We can identify if a student is disengaged by looking at what they do within learning technology. For example, are they gaming the system—asking for hints over and over just to get the answer—or systematically guessing until they get it right (e.g., A,B,C,D… A,B,C,D… A,B,C,D…, etc.) Are they becoming careless—responding quickly and making dumb mistakes? Are they using a fake user account to harvest answers for their main account? And so on. My lab uses a combination of human expertise and data mining to identify these behaviors. First we have experts help us find cases where the behaviors we are looking for are occurring, either by going out to classrooms and taking field notes on handheld apps, or by reviewing system log data. Then we build our models based solely on system log data to identify those behaviors, verifying that our models work on expert labels of entirely new data. At this point, our models are as accurate as first-line medical diagnostics.

Q: Have you studied differences in engagement across age groups and/or different cultures?

A: We have! Actually, the most disengaged population we have ever seen is medical residents learning to diagnose cancer. That’s kind of horrifying, but also unsurprising—medical residents are really busy, and have a lot of other things to think about beyond ongoing computer-based training. We’ve studied disengagement from preschoolers to middle school and high school students to undergraduates, medical residents, military cadets, and adults taking massive online open courses. Our work has involved learners from over 100 countries, with particularly in-depth work in the USA, Philippines, Mexico, India, China, the UAE, the UK, Brazil, Costa Rica, Belgium, and most recently Norway. We find that disengaged behaviors are relatively similar between countries, but the emotions and motivations that underlie the behaviors can differ a lot.

Q: Does your research help to identify and support students who are struggling and/or are at-risk? 

A: Absolutely! Many of the behaviors we identify have now been shown to predict student success over a decade later. Our work with BrightBytes, for instance, identifies students who are at risk of dropping out of high school more than 15 percentage points better than earlier methods (see our peer-reviewed paper in the Proceedings of the International Conference on Educational Data Mining) and combines those predictions with indicators of what factors were important to that prediction—e.g.,  two students might both be at risk, but one has slipping grades whereas the other has increasing absenteeism. 

Q: What’s your advice to a non-technical educator interested in obtaining and using this type of data?

A: If you aren’t technical yourself, you don’t need to learn to do data mining yourself. Find a learning system or predictive analytics solution that meets your needs and let them provide you with the data. Platforms like BrightBytes, Civitas, ASSISTments, Cognitive Tutor, ALEKS, and many others offer comprehensive solutions that you may find useful.

I’m a big believer in combining predictive analytics on student success with reports for instructors, chief learning officers, guidance counselors, etc. By combining data mining’s ability to make prediction in complicated situations with human judgment and expertise in how to support learners, I think we can make a real difference in people’s lives. 

Q: What aspect of your research do you find most exciting and what aspect is most frustrating to you?

A: I love the opportunity to learn things about learners that might have the impact to improve their lives. Even after 100 years of educational research, there’s still so much we don’t know, and we are at the forefront of discovering which behaviors hurt students more than we think (and conversely, which approaches—like taking a quick break—can be much better than we thought). The most frustrating aspect of my research is  the slow path that it sometimes takes between a new scientific finding and testing it in broader instructional practice. We need to do better at coupling scientific research in education with everyday practice!

For more information on the technical aspects of Dr. Baker’s work, please refer to some of his research papers listed below:

Educational Data Mining and Learning Analytics

Comparing machine learning to knowledge engineering for student behavior modelling: A case study in gaming the system

Population validity for Educational Data Mining models: A case study in affect detection

Evelyn Levine works as a Training and Staff Development Director for the U.S. Courts. She writes on worldwide learning and development trends in public and private sectors. She can be reached at