data-science Archives - Pratima Satish's Blog

October 8, 2021October 8, 2021

Letting ML algorithms dictate healthcare can be dangerous

I recently read an article on WIRED that gave me pause. I currently work at a company where we use machine learning models for checking symptoms. Our goal is to help the patient understand the cause for their symptoms and where they can find appropriate care. We want to make the healthcare experience better for everyone!

August 20, 2020December 8, 2020

Update about using ML models to predict student grades

I previously wrote about how there could be more transparency around how the International Baccalaureate (IB) program used ML models to predict student grades. There’s been an update since then. Turns out the UK government also used a similar algorithm to determine students’ A-level scores this year, resulting in a decrease of 40% of the teacher-predicted expected grades. There were protests all over the country from students as this affects where they are admitted for higher education, eventually leading the government to cancel the ML grades in favor of human-predicted ones.

July 22, 2020December 8, 2020

Neural network for bread recipe generation – Part III

Welcome to the third and last part of the bread journey! In parts I and II, I wrote about how I scraped data from The Fresh Loaf to get recipes, explored and visualized the text data and used topic modeling to see what trends exist. In this post, I will describe how I trained two different language generation models to predict AI-based recipes for sourdough bread.

*On the left, a loaf of sourdough bread I made with a recipe generated from my neural network model. On the right, the code to analyze and predict bread recipes.*

July 22, 2020December 8, 2020

Neural network for bread recipe generation – Part II

Welcome back! I previously described how I scraped the baking forum The Fresh Loaf, where people post their bread recipes, to get data to train a neural network to generate new bread recipes. I also detailed how I explored the data. In this post, I explain how I used some unsupervised learning techniques in the Natural Language Processing toolkit to further understand the textual data. Note: all the code I used for this project is in this repo.

A sourdough loaf I made with buckwheat groats to start us off!

July 22, 2020December 8, 2020

Neural network for bread recipe generation – Part I

In 2017, a friend gave me some sourdough starter to make bread with, and ever since then, my life has changed. It sounds cheesy, but I discovered a hobby that has led me to buy almost 200 pounds of flour at a time (seriously), develop a biweekly pizza baking habit, and dream of what bread I’m going to make in the coming days!

Because I spend a lot of time baking sourdough and experimenting with new formulas, I wanted to see if I could create an artificial intelligence-powered recipe generator that would predict something for me to make! One of my go-to websites for technique, tips and tricks has been the helpful bread baking forum, The Fresh Loaf, where people ask questions and post recipes. My idea was to scrape this website and get data to train a neural network to generate new bread recipes – and that’s what I did. At the end of this project, I was able to achieve my goal: to bake a machine learning-inspired loaf of bread, with ingredients predicted with a neural network.

Since there are multiple components to this project, I am breaking them down in a few blog posts. All the code I used for the project is in this repo.

My walnut sourdough loaf, with the accompanying picture of the insides (called the crumb shot) showing purple streaks due to the tannins in walnuts. Chemistry!

July 10, 2020December 8, 2020

How the International Baccalaureate Program can do a better job to use data science for deciding student grades

I read an interesting article today about how the International Baccalaureate (IB) program for high school education decided to cancel end-of-year tests for students and use a statistical model to predict final grades during the COVID-19 pandemic. As an online chemistry tutor for high school students in different programs, IB included, as well as a data scientist, I am a little shocked at how little transparency existed in the process to assign final grades to students. It has obviously caused confusion and distress, and led to offers from colleges and universities being rescinded, which should not be a desired consequence. This got me wondering — what kind of questions did the IB administrators and the unnamed company contracted to build the statistical model ask before releasing the final grades? I’m outlining at least some of the discussions I would make sure to have were I working in this team to solve this challenging problem.

July 9, 2020December 8, 2020

Notes on Random Forests

I recently used k-dimensional trees to get decision boundaries in a very high-dimensional space and find the nearest neighbors for a given vector. I was curious to know what else these trees can be used for in machine learning. This led me to random forests and I read up on a couple blog posts to learn more about them. They seem like a really useful and robust way to approach classification problems and I’ve jotted down a quick summary of decision trees and random forests. Note that this isn’t a complete description of how these methods are defined mathematically; instead, I’m writing quick refresher notes for working with these techniques. Without further ado, here they are!