Update about using ML models to predict student grades

I previously wrote about how there could be more transparency around how the International Baccalaureate (IB) program used ML models to predict student grades. There’s been an update since then. Turns out the UK government also used a similar algorithm to determine students’ A-level scores this year, resulting in a decrease of 40% of the teacher-predicted expected grades. There were protests all over the country from students as this affects where they are admitted for higher education, eventually leading the government to cancel the ML grades in favor of human-predicted ones.

Continue reading “Update about using ML models to predict student grades”

How the International Baccalaureate Program can do a better job to use data science for deciding student grades

I read an interesting article today about how the International Baccalaureate (IB) program for high school education decided to cancel end-of-year tests for students and use a statistical model to predict final grades during the COVID-19 pandemic. As an online chemistry tutor for high school students in different programs, IB included, as well as a data scientist, I am a little shocked at how little transparency existed in the process to assign final grades to students. It has obviously caused confusion and distress, and led to offers from colleges and universities being rescinded, which should not be a desired consequence. This got me wondering — what kind of questions did the IB administrators and the unnamed company contracted to build the statistical model ask before releasing the final grades? I’m outlining at least some of the discussions I would make sure to have were I working in this team to solve this challenging problem.

Continue reading “How the International Baccalaureate Program can do a better job to use data science for deciding student grades”

Notes on Random Forests

I recently used k-dimensional trees to get decision boundaries in a very high-dimensional space and find the nearest neighbors for a given vector. I was curious to know what else these trees can be used for in machine learning. This led me to random forests and I read up on a couple blog posts to learn more about them. They seem like a really useful and robust way to approach classification problems and I’ve jotted down a quick summary of decision trees and random forests. Note that this isn’t a complete description of how these methods are defined mathematically; instead, I’m writing quick refresher notes for working with these techniques. Without further ado, here they are!

Trees on my hike in Santa Cruz
Continue reading “Notes on Random Forests”