Letting ML algorithms dictate healthcare can be dangerous

I recently read an article on WIRED that gave me pause. I currently work at a company where we use machine learning models for checking symptoms. Our goal is to help the patient understand the cause for their symptoms and where they can find appropriate care. We want to make the healthcare experience better for everyone!

Continue reading “Letting ML algorithms dictate healthcare can be dangerous”

Update about using ML models to predict student grades

I previously wrote about how there could be more transparency around how the International Baccalaureate (IB) program used ML models to predict student grades. There’s been an update since then. Turns out the UK government also used a similar algorithm to determine students’ A-level scores this year, resulting in a decrease of 40% of the teacher-predicted expected grades. There were protests all over the country from students as this affects where they are admitted for higher education, eventually leading the government to cancel the ML grades in favor of human-predicted ones.

Continue reading “Update about using ML models to predict student grades”

Neural network for bread recipe generation – Part III

Welcome to the third and last part of the bread journey! In parts I and II, I wrote about how I scraped data from The Fresh Loaf to get recipes, explored and visualized the text data and used topic modeling to see what trends exist. In this post, I will describe how I trained two different language generation models to predict AI-based recipes for sourdough bread.

On the left, a loaf of sourdough bread I made with a recipe generated from my neural network model. On the right, the code to analyze and predict bread recipes.
Continue reading “Neural network for bread recipe generation – Part III”

How the International Baccalaureate Program can do a better job to use data science for deciding student grades

I read an interesting article today about how the International Baccalaureate (IB) program for high school education decided to cancel end-of-year tests for students and use a statistical model to predict final grades during the COVID-19 pandemic. As an online chemistry tutor for high school students in different programs, IB included, as well as a data scientist, I am a little shocked at how little transparency existed in the process to assign final grades to students. It has obviously caused confusion and distress, and led to offers from colleges and universities being rescinded, which should not be a desired consequence. This got me wondering — what kind of questions did the IB administrators and the unnamed company contracted to build the statistical model ask before releasing the final grades? I’m outlining at least some of the discussions I would make sure to have were I working in this team to solve this challenging problem.

Continue reading “How the International Baccalaureate Program can do a better job to use data science for deciding student grades”

Notes on Random Forests

I recently used k-dimensional trees to get decision boundaries in a very high-dimensional space and find the nearest neighbors for a given vector. I was curious to know what else these trees can be used for in machine learning. This led me to random forests and I read up on a couple blog posts to learn more about them. They seem like a really useful and robust way to approach classification problems and I’ve jotted down a quick summary of decision trees and random forests. Note that this isn’t a complete description of how these methods are defined mathematically; instead, I’m writing quick refresher notes for working with these techniques. Without further ado, here they are!

Trees on my hike in Santa Cruz
Continue reading “Notes on Random Forests”