-
Peak Performance or Just Noise?
It's easy to look at machine learning leaderboards and assume raw scores tell the whole story. In this post, I compare statistical methods to cut through the noise and help us spot genuine model superiority.
-
Working with Large Virtual Chemical Libraries: Part 3 - Thompson Sampling for Classification
Exhaustively screening billion-compound virtual libraries would take decades, so we need smarter ways to hunt for molecules. I look at how we can adapt Thompson Sampling, a classic reinforcement learning technique, using the Beta distribution to efficiently find active compounds without breaking our computers.
-
Interpretability vs. Explainability in Cheminformatics
Interpretability and explainability are different concepts in machine learning, yet many cheminformatics authors use the terms interchangeably.
-
Chemprop-RF: A Hybrid Approach to Chemical Property Prediction
Can we combine d-MPNNs and Random Forests to outperform each of them individually?
-
Drug Repurposing Using Artificial Intelligence
Finding new uses for existing, approved medications is a massive shortcut in drug discovery. After bad weather ruined my weekend hiking plans, I sat down to build an open-source deep learning workflow to virtually screen clinical libraries for hidden hits.
-
Building a Traffic Reminder Widget
I got tired of constantly checking Google Maps after 4 pm to guess my evening commute time, so I decided to automate it. Here is a look at a quick personal project that bridges WSL and Windows to fetch TomTom routing data and ping me with desktop notifications.
-
TabPFN for Chemical Datasets
TabPFN is a new transformer-based foundation model that claims to handle tabular data in a single, lightning-fast forward pass. I decided to put it to the test on several molecular property benchmarks to see how it holds up out of the box.
-
Working with Large Virtual Chemical Libraries: Part 2 - Genetic Algorithms
When a virtual library is way too massive to screen one molecule at a time, genetic algorithms offer an elegant way out. In part two of this series, I explore how biologically inspired selection can navigate massive combinatorial spaces using just building block data.
-
Displaying Distributions with Raincloud Plots
Every time I used violin plots in presentations, the feedback turned into a debate over whether they looked like sea creatures or medieval weapons. If you want a cleaner way to show your data, raincloud plots are an incredibly intuitive alternative that combines raw data points, box plots, and density curves beautifully.
-
Working with Large Virtual Chemical Libraries: Part 1 - Active Learning
If a computational scoring function takes just one second per molecule, screening a billion-compound library would take nearly 32 years. In part one of this series, I look at how we can use active learning loops to train a machine learning model, allowing us to intelligently hunt down the highest-performing molecules without exhaustively testing the whole library.
-
I Want to Become a Data Scientist, but I Have No Idea Where to Start...
When I first started looking into retraining for data science, I felt completely lost and unguided. I wrote this post to share the exact things I wish I’d known before setting out, from picking the right Python courses to navigating bootcamps and getting that first bit of real-world experience.