Machine Learning

May 13, 2026 • 1 min read

Peak Performance or Just Noise?

It's easy to look at machine learning leaderboards and assume raw scores tell the whole story. In this post, I compare statistical methods to cut through the noise and help us spot genuine model superiority.

Read article

Nov 26, 2025 • 17 min read

Working with Large Virtual Chemical Libraries: Part 3 - Thompson Sampling for Classification

Exhaustively screening billion-compound virtual libraries would take decades, so we need smarter ways to hunt for molecules. I look at how we can adapt Thompson Sampling, a classic reinforcement learning technique, using the Beta distribution to efficiently find active compounds without breaking our computers.

Read article

Nov 9, 2025 • 7 min read

Interpretability vs. Explainability in Cheminformatics

Interpretability and explainability are different concepts in machine learning, yet many cheminformatics authors use the terms interchangeably.

Read article

Sep 12, 2025 • 12 min read

Chemprop-RF: A Hybrid Approach to Chemical Property Prediction

Can we combine d-MPNNs and Random Forests to outperform each of them individually?

Read article

May 3, 2025 • 11 min read

Drug Repurposing Using Artificial Intelligence

Finding new uses for existing, approved medications is a massive shortcut in drug discovery. After bad weather ruined my weekend hiking plans, I sat down to build an open-source deep learning workflow to virtually screen clinical libraries for hidden hits.

Read article

Jan 22, 2025 • 7 min read

TabPFN for Chemical Datasets

TabPFN is a new transformer-based foundation model that claims to handle tabular data in a single, lightning-fast forward pass. I decided to put it to the test on several molecular property benchmarks to see how it holds up out of the box.

Read article

May 18, 2024 • 15 min read

Working with Large Virtual Chemical Libraries: Part 1 - Active Learning

If a computational scoring function takes just one second per molecule, screening a billion-compound library would take nearly 32 years. In part one of this series, I look at how we can use active learning loops to train a machine learning model, allowing us to intelligently hunt down the highest-performing molecules without exhaustively testing the whole library.

Read article